Objective Gender and Age Recognition from Speech Sentences
AbstractIn this work, an automatic gender and age recognizer from speech is investigated. The relevant features to gender recognition are selected from the first four formant frequencies and twelve MFCCs and feed the SVM classifier. While the relevant features to age has been used with k-NN classifier for the age recognizer model, using MATLAB as a simulation tool. A special selection of robust features is used in this work to improve the results of the gender and age classifiers based on the frequency range that the feature represents. The gender and age classification algorithms are evaluated using 114 (clean and noisy) speech samples uttered in Kurdish language. The model of two classes (adult males and adult females) gender recognition, reached 96% recognition accuracy. While for three categories classification (adult males, adult females, and children), the model achieved 94% recognition accuracy. For the age recognition model, seven groups according to their ages are categorized. The model performance after selecting the relevant features to age achieved 75.3%. For further improvement a de-noising technique is used with the noisy speech signals, followed by selecting the proper features that are affected by the de-noising process and result in 81.44% recognition accuracy.
Bahari, M.H. and Van hamme, H., 2011. Speaker age estimation and gender detection based on supervised non-negative matrix factorization, In: IEEE, IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications, Italy, 28 September 2011. USA.
Bocklet, T., Maier, A. and North, E., 2008. Age Detection of Children in Preschool and primary School Age with GMM-Based Super vector and Support Vector Machines/regression. In: 11th International Conference, TSD 2008, Bron, Czech Republic, 8-12 September 2008.
Dobry, G., Hetch, M., Avegal, M., and Zigel, Y., 2011. Super vector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal, IEEE Trans. Audio, Speech and Language Processing, 19(7), pp.1975–1985.
Faek, F.K., Al-Talabani, A.K., 2013. Speaker recognition from noisy spoken sentences, International Journal of Computer Applications. 70(20), pp.11-14.
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Muller, C., Huber, R., Andrassy, B., Bauer, J.G. and Littel, B., 2007. Comparison of four approaches to age and gender recognition for telephone applications, In: IEEE, IEEE international conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, 15-20 April 2007. USA.
Golfer, M. and Mikes, V. 2005. The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels, Journal of Voice, 19 (4), pp.544-554.
Harnsberger, J.D., Shrivastav, R., Brown, W.S., Rothman, H. and Hollien, H., 2008. Speaking rate and fundamental frequency as speech cues to perceived age, Journal of Voice, 22(1), pp.58-69.
Hugo, M. and Isabel, T., 2011. Age and gender detection in the I-DASH project ACM, Transactions on Speech and Language Processing, 7(4), 16 pages. DOI 10.1145/1998384.1998387.
Li, M., Han, K.J. and Narayanan, S., 2012. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Computer Speech and Language, 27, pp.151-167.
Mirhassani, S.M., Zourmand, A. and Ting, H.N., 2014. Age Estimation Based on Children's Voice: A Fuzzy-Based Decision Fusion Strategy. Scientific World Journal, [online] Available at:
< http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4070543/> [Accessed 22 November 2014].
Potamianos, A. and Narayanan, S., 2003. Robust recognition of children’s speech. IEEE Trans. Speech Audio Processing, 11(6), pp.603–616.
Santosh, G., Bharti, G. and Mehrotra, S.C., 2012. Gender identification using SVM with Combination of MFCC, Advances in Computational Research, 4(1), pp.69-73.
SAS. J. and SAS., A., 2013. Gender recognition using neural network and ASR techniques, Journal of medical information and technologies, 22, pp.179-187.
Sedaaghi, M.H., 2009. A comparative study of gender and age classification in speech signals, Iranian Journal of Electrical & Electronic Engineering, 5(1), pp.1-12.
Thomas, P., Vahid, H., Isabel, T., Annika, H., Miguel, S., 2014, Speaker age estimation for elderly speech recognition in European Portuguese. In: The 15th Annual Conference of the International Speech Communication Association - INTERSPEECH 2014, Singapore, 14-18 September 2014.
Tiwari, V., Ganga, G., Singhai, J. and Azad, M., 2011. Wavelet based noise robust features for speaker recognition, Signal Processing: An International Journal (SPIJ), 5(2), pp.52-64.
Copyright (c) 2016 Fatima K. Faek
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0] that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).