A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)

P. MAHALAKSHMI

doi:10.22159/ajpcr.2016.v9s3.14352

Authors

P. MAHALAKSHMI VIT University, Vellore -632 014, India.

DOI:

https://doi.org/10.22159/ajpcr.2016.v9s3.14352

Abstract

ABSTRACT
Objective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes over
two decades.
Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoiced
signals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.
Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.
Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research were
also discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.
Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.

Downloads

Download data is not yet available.

Author Biography

P. MAHALAKSHMI, VIT University, Vellore -632 014, India.

School of Electrical Engineering

References

Haig JH, Mason JS. Robust Voice Activity Detection Using Cepstral Features IEEE TENCONâ€™93, Beijing; 1993.

Nijhawan G, Soni MK. Speaker recognition using MFCC and vector quantisation. Int J Recent Trends Eng Technol 2014;11(1):7-10.

McCowan I, Dean D, McLaren M, Vogt O, Sridharan S. The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. Vol. 19; 2011.

Alsteris LD, Paliwal KK. Short-time phase spectrum in speech processing: A review and some experimental results. Digit Signal Process 2007;17(3):578-616.

Sahidullah M, Saha G. Comparison of speech activity detection techniques for speaker recognition. arXiv preprint arXiv:1210.0297; 2012.

Geeta N, Soni MK. A new design approach for speaker recognition using MFCC and VAD. Int J Image Graph Signal Process (IJIGSP) 2013;5(9):43-9.

RamÄ±rez J, Segura JC, BenÄ±tez C, De La Torre A, Rubio A. Efficient voice activity detection algorithms using long-term speech information. Speech Commun 2004;42(3):271-87.

Srinivasan K, Gersho A. Voice Activity Detection for Cellular Networks, In: Proceedings IEEE Workshop Speech Coding for Telecommunications; 1993. p. 85-6.

Sohn J, Kim NS, Sung W. A statistical model-based voice activity detection. IEEE Signal Process Lett 1999;6(1):1-3.

ChoYD, Kondoz A. Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Process Lett 2001;8(10):276-8.

Gazor S, Zhang W. A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans Speech Audio Process 2003;11(5):498-505.

Srinivasan A. Speaker identification and verification using vector quantization and Mel frequency cepstral coefficients. Res J Appl Sci Eng Technol 2012;4(I):33-40.

Tiwari V. MFCC and its applications in speaker recognition. Int J Emerg Technol 2010;1(I):19-22.

Enqing D, Guizhong L, Yatong Z, Xiaodi Z. Applying Support Vector Machines to Voice Activity Detection, In 6th International Conference Signal Process. Vol. 2. IEEE, 2003. p. 1124-7.

Cooke M, Green P, Josifovski L, Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun2001;34(3):267-85.

Kanedera N, Arai T, Hermansky H, Pavel M. On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Commun 1999;28(1):43-55.

Xueqin C, Yu Y, Zhao H. F0 Prediction from Linear Predictive Cepstral Coefficient. Wireless Communications and Signal Processing (WCSP), 2014 Sixth International Conference on. IEEE, 2014.

Shim HK, Park S, Chatterjee M, Scherer S, Sagae K, Morency LP. Acoustic and Para-Verbal Indicators of Persuasiveness in Social Mulatimedia. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.

Ganchev T, Fakotakis N, Kokkinakis G. Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task. Proceedings of the SPECOM. Vol. 1, 2005.

Peeters G. Deriving musical structures from signal analysis for music audio summary generation: Sequence and state approach. Lecture Notes in Computer Science. Bologna: Springer-Verlag; 2004.

Peeters G, Laburthe A, Rodet X. Toward Automatic Music Audio Summary Generation from Signal Analysis, Proceeding ISMIR; 2002. p. 94-100.

Xianglian L, Liu R, Li M. A Review on Objective Music Structure Analysis. Information and Multimedia Technology, 2009. ICIMTâ€™09. International Conference on. IEEE; 2009.