Speech emotion recognition using machine learning

Abhishek Kumar Saw; Chetna Arya; Devbrat Sahu; Shweta Shrivas

doi:10.53730/ijhs.v6nS1.8662

Authors

Abhishek Kumar Saw Assistant Professor, Department of Computer Science & Engineering, Shri Shankaracharya Institute of Professional Management and Technology, Raipur, Chhattisgarh, India
Chetna Arya B. Tech (Scholar) Department of Computer Science & Engineering, Shri Shankaracharya Institute of Professional Management and Technology, Raipur, Chhattisgarh, India
Devbrat Sahu Assistant Professor, Department of Computer Science & Engineering, Shri Shankaracharya Institute of Professional Management and Technology, Raipur, Chhattisgarh, India
Shweta Shrivas B. Tech (Scholar) Department of Computer Science & Engineering, Shri Shankaracharya Institute of Professional Management and Technology, Raipur, Chhattisgarh, India

Keywords:

Artificial Neural Network (ANN), Machine Learning (ML), Multiplayer Perception (MLP), support vector machine (SVM), dataset, Speech Emotion Recognition (SER), acoustic features

Abstract

Humans connect to each other through language. Verbal words play an important role in communication. The project works on determining an emotion behind verbal words. Speech Emotion Recognition is a system where we determine emotions from live audio. People from all around the globe use speech to convey their emotion irrespective of their background. Emotion recognition from human speech is challenging as there are many factors which play important role in formation of an emotion. It is one of the growing fields in interaction of machine and human. Majorly sub-domains of artificial intelligence are used in the task of prediction. Machine learning is used in the project. Machine learning (ML) uses a dataset and algorithm to predict or detect any future possibility. In this project we propose the application of Artificial Neural Network to determine emotion. Artificial Neural Network is based on how biological brain work. It has neurons which are connected to each other and are called nodes. The Classifier used in this project is Multilayer perception (MLP), Decision tree classifier, support vector machine (SVM), random forest classifier. Speech Emotion recognition using machine learning have certain steps to attain result. Firstly we need a dataset to train the program.

Downloads

Download data is not yet available.

References

Umair Ayub. “Speech emotion recognition using machine learning”,(2020).

B. W. Schuller, "Speech emotion recognition: Two decades in a nutshell benchmarks and ongoing trends", Commun. ACM, vol. 61, no. 5, pp. 90-99, Apr. 2018.

M. Chen, P. Zhou and G. Fortino, "Emotion communication system", IEEE Access, vol. 5, pp. 326-337, 2017.

M. B. Akçay and K. Oğuz, "Speech emotion recognition: Emotional models databases features preprocessing methods supporting modalities and classifiers", Speech Commun., vol. 116, pp. 56-76, Jan. 2020

R. Munot and A. Nenkova, "Emotion impacts speech recognition performance", Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics Student Res. Workshop, 2019.

S. A. A. Qadri, T. S. Gunawan, M. F. Alghifari, H. Mansor, M. Kartiwi and Z. Janin, "A critical insight into multi-languages speech emotion databases", Bull. Elect. Eng. Inform., vol. 8, Dec. 2019.

A. Guidi, C. Gentili, E. P. Scilingo and N. Vanello, "Analysis of speech features and personality traits", Biomed. Signal Process. Control, vol. 51, May 2019

F. W. Smith and S. Rossit, "Identifying and detecting facial expressions of emotion in peripheral vision", PLoS ONE, vol. 13, no. 5, May 2018

Marsella, S. and Gratch, J. “Computationally modeling human emotion” . Commun. ACM 57, 12 (Dec. 2014).

D. Bharti and P. Kukana, "A hybrid machine learning model for emotion recognition from speech signals", Proc. Int. Conf. Smart Electron. Commun., pp. 491-496, Sep. 2020

Bhaykar, M., Yadav, J. and Rao, K.S. “Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM”. In Proceedings of the National Conference on Communications. (Delhi, India, 2013).

D. Gupta, P. Bansal and K. Choudhary, "The state of the art of feature extraction techniques in speech recognition" in Speech and Language Processing for Human-Machine Communications, Singapore:Springer, 2018.

M. Neumann and N. T. Vu, "Improving speech emotion recognition with unsupervised representation learning on unlabeled speech", Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2019.

Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G. and Schuller, B. “Deep neural networks for acoustic emotion recognition: Raising the benchmarks” .In Proceedings of ICASSP. (Prague, Czech Republic, 2011). IEEE,5688–5691.

Mao, Q., Dong, M., Huang, Z. and Zhan, Y. “Learning salient features for speech emotion recognition using convolutional neural networks” IEEE Trans. Multimedia 16, 8 (2014).

Leng, Y., Xu, X., and Qi, G. “Combining active learning and semi-supervised learning to construct SVM classifier” Knowledge-Based Systems 44 (2013).

Deng, J., Xu, X., Zhang, Z., Frühholz, S., and Schuller, B. “Semisupervised Autoencoders for Speech Emotion Recognition” IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 1 (2018).

Gunes, H. and Schuller, B. “Categorical and dimensional affect analysis in continuous input: Current trends and future directions” , Image and Vision Computing 31, 2 (2013),.

Jeng-Lin Li1, Chi-Chun Lee, “Attentive to individual: A multimodal emotion recognition network with personalized attention profile,” Proc. Interspeech, pp. 211–215,(2019).

Rinartha, K., Suryasa, W., & Kartika, L. G. S. (2018). Comparative Analysis of String Similarity on Dynamic Query Suggestions. In 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS) (pp. 399-404). IEEE.