Analysis of audio signal using various transforms for enhanced audio processing
Keywords:
discrete fourier transform (DFT), discrete sine transform (DST), discrete cosine transform, modified discrete cosine transform (MDCT), integer modified discrete cosine transformAbstract
Audio Signals are the portrayal of sounds. It changes with respect to frequencies rather than time, and it shows more information in the frequency domain. So it is much appropriate to evaluate in the frequency domain rather than the time domain. By using different transforms like DFT, DST, DCT, MDCT, Integer MDCT, the time domain audio signal can be converted into a frequency domain signal. The signal is reconstructed to analyze the features like mean square error, Signal to noise ratio, Peak signal to noise ratio between the original and reconstructed signal. Other features like energy, entropy, zero crossing rates (ZCR) were also considered for the evaluation. In this paper, different audio file formats were taken for interpretation. It includes wave file, mp3 file, m4a file, aac file, where wave file is in uncompressed format and mp3, m4a, aac are in compressed format. These compressed files come under lossy compression. The above-mentioned features are used for applications like music information retrieval (MIR). MIR includes onset detection, pitch detection and to measure the noise and loudness of the music.
Downloads
References
Theodoros Giannakopoulos, Aggelos Pikrakis ,“Introduction to Audio Analysis: A MATLAB Approach,” Academic press, 2014
Emmanuel Ravelli, Gaël Richardand Laurent Daudet “Audio Signal Representations for Indexing in the Transform Domain,” IEEE Transactions on audio, speech, and language processing, vol. 18, no. 3, March 2010
Sylvain Marchand, "Fourier-based methods for the spectral analysis of musical sounds," Signal processing conference (EUSIPCO), 2013 proceedings of the 21st european , vol., no., pp.1,5, 9-13 September. 2013
R.G. Moreno-Alvarado, Mauricio Martinez-Garcia,” DCT-compressive Sampling of Frequencysparse Audio Signals,” Proceedings of the World Congress on Engineering 2011 vol II, wce 2011, July 6 - 8, 2011, London, U.K.
Shuhua Zhang, Weibei Dou, Huazhong Yang, "MDCT Sinusoidal Analysis for Audio Signals Analysis and Processing," Audio, speech, and language processing, IEEE Transactions on , vol.21, no.7, pp.1403,1414, July 2013
Dominique Fourer ,Sylvain Marchand, “Informed spectral analysis: audio signal parameter estimation using side information,” EURASIP Journal on Advances in Signal Processing, December 2013
R. R. Coifman, Y. Meyer, and V. Wickerhauser, “Wavelet analysis and signal processing,” in In Wavelets and their Applications. Citeseer,1992.
Vladimir Britnak, Pratnik Yip, Kamisetty R.Rao, “Discrete Cosine and Sine Transforms :General Properties, Fast Algorithms and Integer Approximations” Academic press, 2007
H.Malvar, “A Modulated Complex Lapped Transform and its Applications to Audio Processing,” in Proc.IEEE Int. Conf.Acoust.,Speech,Signal Process.(ICASSP ’99 ),March 1999, vol.3, pp.1421-1424.
C.Cheng, ”Method for estimating magnitude and phase in the MDCT domain,” in Proc. 116th AES Conv.,May 2004,pp.6091-6091,Audio Eng. Soc.
Mu-Huo Cheng and Yu-Hsin Hsu, “Fast IMDCT and MDCT Algorithms— A Matrix Approach,” IEEE Transactions on signal processing, vol. 51, no. 1, January 2003
Yaroslavsky, L., & Wang, Y., “ DFT, DCT, MDCT, DST and signal Fourier spectrum analysis,” EUPSICO 2000: European signal processing conference, pp. 1065-1068.
Yoshikazu Yokotani, Member IEEE, Ralf Geiger, Member IEEE, Gerald D.T.Schuller, Senior Member, IEEE & K.R.Rao,, “Lossless Audio Coding using the Int MDCT & the round error shaping”, IEEE trans. on Audio,Speech & Language Processing, Vol.14, No.6, Nov 2006.
Rongshan Yu, Member, IEEE, Susanto Rahardja, Lin Xiao and Chi Chung Ko, Senior Member, IEEE, “A Fine Granular Scalable to Lossless Audio Coder”, IEEE Trans. On Audio, Speech and Language Processing, Vol.14. No.4. July2006
Te Li, Student Member IEEE, Rongshan Yu, Member IEEE, Susanto Rahardja, Member IEEE, Soo Ngee Koh, Member IEEE, “On integer MDCT for Perceptual Audio Coding”, IEEE trans. on Audio,Speech & Language Processing, Vol.15, No.8, Nov 2007.
Published
How to Cite
Issue
Section
Copyright (c) 2022 International journal of health sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.