Objective Gender and Age Recognition from Speech Sentences
Abstract
In this work, an automatic gender and age recognizer from speech is investigated. The relevant features to gender recognition are selected from the first four formant frequencies and twelve MFCCs and feed the SVM classifier. While the relevant features to age has been used with k-NN classifier for the age recognizer model, using MATLAB as a simulation tool. A special selection of robust features is used in this work to improve the results of the gender and age classifiers based on the frequency range that the feature represents. The gender and age classification algorithms are evaluated using 114 (clean and noisy) speech samples uttered in Kurdish language. The model of two classes (adult males and adult females) gender recognition, reached 96% recognition accuracy. While for three categories classification (adult males, adult females, and children), the model achieved 94% recognition accuracy. For the age recognition model, seven groups according to their ages are categorized. The model performance after selecting the relevant features to age achieved 75.3%. For further improvement a de-noising technique is used with the noisy speech signals, followed by selecting the proper features that are affected by the de-noising process and result in 81.44% recognition accuracy.Downloads
References
Bahari, M.H. and Van hamme, H., 2011. Speaker age estimation and gender detection based on supervised non-negative matrix factorization, In: IEEE, IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications, Italy, 28 September 2011. USA.
Bocklet, T., Maier, A. and North, E., 2008. Age Detection of Children in Preschool and primary School Age with GMM-Based Super vector and Support Vector Machines/regression. In: 11th International Conference, TSD 2008, Bron, Czech Republic, 8-12 September 2008.
Dobry, G., Hetch, M., Avegal, M., and Zigel, Y., 2011. Super vector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal, IEEE Trans. Audio, Speech and Language Processing, 19(7), pp.1975–1985.
Faek, F.K., Al-Talabani, A.K., 2013. Speaker recognition from noisy spoken sentences, International Journal of Computer Applications. 70(20), pp.11-14.
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Muller, C., Huber, R., Andrassy, B., Bauer, J.G. and Littel, B., 2007. Comparison of four approaches to age and gender recognition for telephone applications, In: IEEE, IEEE international conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, 15-20 April 2007. USA.
Golfer, M. and Mikes, V. 2005. The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels, Journal of Voice, 19 (4), pp.544-554.
Harnsberger, J.D., Shrivastav, R., Brown, W.S., Rothman, H. and Hollien, H., 2008. Speaking rate and fundamental frequency as speech cues to perceived age, Journal of Voice, 22(1), pp.58-69.
Hugo, M. and Isabel, T., 2011. Age and gender detection in the I-DASH project ACM, Transactions on Speech and Language Processing, 7(4), 16 pages. DOI 10.1145/1998384.1998387.
Li, M., Han, K.J. and Narayanan, S., 2012. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Computer Speech and Language, 27, pp.151-167.
Mirhassani, S.M., Zourmand, A. and Ting, H.N., 2014. Age Estimation Based on Children's Voice: A Fuzzy-Based Decision Fusion Strategy. Scientific World Journal, [online] Available at:
< http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4070543/> [Accessed 22 November 2014].
Potamianos, A. and Narayanan, S., 2003. Robust recognition of children’s speech. IEEE Trans. Speech Audio Processing, 11(6), pp.603–616.
Santosh, G., Bharti, G. and Mehrotra, S.C., 2012. Gender identification using SVM with Combination of MFCC, Advances in Computational Research, 4(1), pp.69-73.
SAS. J. and SAS., A., 2013. Gender recognition using neural network and ASR techniques, Journal of medical information and technologies, 22, pp.179-187.
Sedaaghi, M.H., 2009. A comparative study of gender and age classification in speech signals, Iranian Journal of Electrical & Electronic Engineering, 5(1), pp.1-12.
Thomas, P., Vahid, H., Isabel, T., Annika, H., Miguel, S., 2014, Speaker age estimation for elderly speech recognition in European Portuguese. In: The 15th Annual Conference of the International Speech Communication Association - INTERSPEECH 2014, Singapore, 14-18 September 2014.
Tiwari, V., Ganga, G., Singhai, J. and Azad, M., 2011. Wavelet based noise robust features for speaker recognition, Signal Processing: An International Journal (SPIJ), 5(2), pp.52-64.
Copyright (c) 2015 Fatima K. Faek
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.