A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)


P Mahalakshmi

Abstract


ABSTRACT
Objective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes over
two decades.
Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoiced
signals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.
Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.
Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research were
also discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.
Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.


| PDF |

References


Haig JH, Mason JS. Robust Voice Activity Detection Using Cepstral Features IEEE TENCON’93, Beijing; 1993.

Nijhawan G, Soni MK. Speaker recognition using MFCC and vector quantisation. Int J Recent Trends Eng Technol 2014;11(1):7-10.

McCowan I, Dean D, McLaren M, Vogt O, Sridharan S. The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING. Vol. 19; 2011.

Alsteris LD, Paliwal KK. Short-time phase spectrum in speech processing: A review and some experimental results. Digit Signal Process 2007;17(3):578-616.

Sahidullah M, Saha G. Comparison of speech activity detection techniques for speaker recognition. arXiv preprint arXiv:1210.0297; 2012.

Geeta N, Soni MK. A new design approach for speaker recognition using MFCC and VAD. Int J Image Graph Signal Process (IJIGSP) 2013;5(9):43-9.

Ramırez J, Segura JC, Benıtez C, De La Torre A, Rubio A. Efficient voice activity detection algorithms using long-term speech information. Speech Commun 2004;42(3):271-87.

Srinivasan K, Gersho A. Voice Activity Detection for Cellular Networks, In: Proceedings IEEE Workshop Speech Coding for Telecommunications; 1993. p. 85-6.

Sohn J, Kim NS, Sung W. A statistical model-based voice activity detection. IEEE Signal Process Lett 1999;6(1):1-3.

ChoYD, Kondoz A. Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Process Lett 2001;8(10):276-8.

Gazor S, Zhang W. A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans Speech Audio Process 2003;11(5):498-505.

Srinivasan A. Speaker identification and verification using vector quantization and Mel frequency cepstral coefficients. Res J Appl Sci Eng Technol 2012;4(I):33-40.

Tiwari V. MFCC and its applications in speaker recognition. Int J Emerg Technol 2010;1(I):19-22.

Enqing D, Guizhong L, Yatong Z, Xiaodi Z. Applying Support Vector Machines to Voice Activity Detection, In 6th International Conference Signal Process. Vol. 2. IEEE, 2003. p. 1124-7.

Cooke M, Green P, Josifovski L, Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun2001;34(3):267-85.

Kanedera N, Arai T, Hermansky H, Pavel M. On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Commun 1999;28(1):43-55.

Xueqin C, Yu Y, Zhao H. F0 Prediction from Linear Predictive Cepstral Coefficient. Wireless Communications and Signal Processing (WCSP), 2014 Sixth International Conference on. IEEE, 2014.

Shim HK, Park S, Chatterjee M, Scherer S, Sagae K, Morency LP. Acoustic and Para-Verbal Indicators of Persuasiveness in Social Mulatimedia. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.

Ganchev T, Fakotakis N, Kokkinakis G. Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task. Proceedings of the SPECOM. Vol. 1, 2005.

Peeters G. Deriving musical structures from signal analysis for music audio summary generation: Sequence and state approach. Lecture Notes in Computer Science. Bologna: Springer-Verlag; 2004.

Peeters G, Laburthe A, Rodet X. Toward Automatic Music Audio Summary Generation from Signal Analysis, Proceeding ISMIR; 2002. p. 94-100.

Xianglian L, Liu R, Li M. A Review on Objective Music Structure Analysis. Information and Multimedia Technology, 2009. ICIMT’09. International Conference on. IEEE; 2009.




About this article

Title

A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)

DOI

10.22159/ajpcr.2016.v9s3.14352

Date

01-12-2016

Additional Links

Manuscript Submission

Journal

Asian Journal of Pharmaceutical and Clinical Research
Vol 9 Suppl 3 December 2016 Page: 360-363

Print ISSN

0974-2441

Online ISSN

2455-3891

Statistics

324 Views | 842 Downloads

Authors & Affiliations

P Mahalakshmi
VIT University, Vellore -632 014, India.
India


Article Tools


Email this article (Login required)
Email the author (Login required)

Refbacks

  • There are currently no refbacks.