A novel voiceprint using ensembled Mel-Chromagram for speaker recognition
Keywords:
mel-spectrogram, MFCC, chromagram, LSTM, keyword-dependent speaker recognition, voiceprintAbstract
This research paper proposes a novel voiceprint generation methodology for recognizing the speakers registered in a system. The proposed methodology is a keyword-dependent closed set speaker classification task. The features used are Mel-Spectrogram, Chromagram, MFCC and a new ensembled feature called Mel-Chroma. Mel-Chroma is generated with the combination of Mel-spectrogram and Chromagram. The Mel-Chroma spectrogram generated is converted into a binary image by using the average as the threshold. The recurrent neural network model LSTM is used for the classification task and the dataset used is FSDD. The proposed method has a higher accuracy than the state-of-art methods for the specific task. The accuracy obtained for the classification of speakers using a binary Mel-Chroma voiceprint is 98.33%.
Downloads
References
Banuroopa, K., & Shanmuga Priyaa, D. (2022). MFCC based hybrid fingerprinting method for audio classification through LSTM. International Journal of Nonlinear Analysis and Applications, 12(Special Issue), 2125-2136.
Birajdar, G.K., Patil, M.D. Speech/music classification using visual and spectral chromagram features. J Ambient Intell Human Comput 11, 329–347 (2020).
Birajdar, Gajanan K. and Mukesh D. Patil. “Speech/music classification using visual and spectral chromagram features.” Journal of Ambient Intelligence and Humanized Computing (2020): 1-19.
Dong Zhipeng et al 2019 Voiceprint recognition based on BP Neural Network and CNN. J. Phys.: Conf. Ser. 1237 032032
El-Moneim, S. A., Nassar, M. A., Dessouky, M. I., Ismail, N. A., El-Fishawy, A. S., El-Samie, A., & Fathi, E. (2020). Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools and Applications, 79(33), 24013-24028.
El-Moneim, S.A., Sedik, A., Nassar, M.A. et al. Text-dependent and text-independent speaker recognition of reverberant speech based on CNN. Int J Speech Technol 24, 993–1006 (2021).
Georgescu, A.; Cucu, H. GMM-UBM Modeling for Speaker Recognition on a Romanian Large Speech Corpora. In Proceedings of the 2018 International Conference on Communications (COMM), Bucharest, Romania, 14–16 June 2018; pp. 547–551
Hourri, S., Nikolov, N.S. & Kharroubi, J. A deep learning approach to integrate convolutional neural networks in speaker recognition. Int J Speech Technol 23, 615–623 (2020).
Jiang, N., Liu, T. Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network. Int. J. Autom. Comput. 18, 947–962 (2021).
Li, X.K., Zheng, Y.L., et al. (2018) Research on voiceprint recognition method based on deep learning. Journal of engineering, Heilongjiang University, 009(001): 64-70.
Lu Y.N., Shan B.Y., Guan C. (2017) Current situation and application of voiceprint recognition technology. Information system engineering, 02:13-13.
Lukic, Y., Vogt, C. Durr, O., Stadelmann, T. (2016). Speaker identifcation and clustering using convolutional neural networks. In IEEE international workshop on machine learning for signal processing, Sept. 13–16, 2016.
Novoa, R. B. (2021). State of the art and future applications of digital health in Chile. International Journal of Health & Medical Sciences, 4(3), 355-361. https://doi.org/10.31295/ijhms.v4n3.1772
P. Li, M. Chen, F. Hu and Y. Xu, "A spectrogram-based voiceprint recognition using deep neural network," The 27th Chinese Control and Decision Conference (2015 CCDC), 2015, pp. 2923-2927, doi: 10.1109/CCDC.2015.7162425.
Rafizah Mohd Hanifa, Khalid Isa, Shamsul Mohamad. (2021). A review on speaker recognition: Technology and challenges, Computers & Electrical Engineering, Volume 90.
Shanthi, T. S., Lingam, C. (2014). Isolated word speech recognition system using htk. International Journal of Computer Science Engineering and Information Technology Research, 4(2), 81-86.
Sun, Cunwei, Yuxin Yang, Chang Wen, Kai Xie, and Fangqing Wen. 2018. "Voiceprint Identification for Limited Dataset Using the Deep Migration Hybrid Model Based on Transfer Learning" Sensors 18, no. 7: 2399.
T. B. Mokgonyane, T. J. Sefara, T. I. Modipa, M. M. Mogale, M. J. Manamela and P. J. Manamela, "Automatic Speaker Recognition System based on Machine Learning Algorithms," 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), 2019, pp. 141-146
Tian, Y., Cai, M., et al. (2016) Speaker recognition system based on deep neural network and bottleneck feature. Journal of Tsinghua University (NATURAL SCIENCE EDITION), (11): 1143-1148.
Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits and Systems Magazine 11:23–61
Widana, I.K., Sumetri, N.W., Sutapa, I.K., Suryasa, W. (2021). Anthropometric measures for better cardiovascular and musculoskeletal health. Computer Applications in Engineering Education, 29(3), 550–561. https://doi.org/10.1002/cae.22202
Xin, W., Zhang, H.R. (2018) Robust i-vector speaker recognition method based on DNN processing. Computer Engineering & Applications.
Y. Liu, Y. Qian, N. Chen, T. Fu, Y. Zhang, and K. Yu, “Deep feature for text-dependent speaker verification,” Speech Communication, vol. 73, pp. 1–13, 2015.
Ye, F.; Yang, J. A Deep Neural Network Model for Speaker Identification. Appl. Sci. 2021, 11, 3603
Ye, Feng, and Jun Yang. 2021. "A Deep Neural Network Model for Speaker Identification" Applied Sciences 11, no. 8: 3603. https://doi.org/10.3390/app11083603.
Z. Liu, Z. Wu, T. Li, J. Li and C. Shen, "GMM and CNN Hybrid Method for Short Utterance Speaker Recognition," in IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3244-3252, July 2018, doi: 10.1109/TII.2018.2799928.
Zeng, C.Y., Ma, C.F., et al. (2020) Robust speaker recognition method based on convolutional neural network. Journal of Huazhong University of science and Technology (NATURAL SCIENCE EDITION), 48(06): 39-44.
Published
How to Cite
Issue
Section
Copyright (c) 2022 International journal of health sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.