Gender Prediction of Journalists from Writing Style

  • Peshawa J. Muhammad Ali Department of Software Engineering, Faculty of Engineering, Koya University, University Park, Danielle Mitterrand Boulevard, Koya KOY45, Kurdistan Region.
  • Nigar M. Shafiq Surameery Department of Software Engineering, Faculty of Engineering, Koya University, University Park, Danielle Mitterrand Boulevard, Koya KOY45, Kurdistan Region.
  • Abdul-Rahman Mawlood Yunis Canada Revenue Agency, Ottawa, Ontario,
  • Ladeh Sardar Abdulrahman Department of Software Engineering, Faculty of Engineering, Koya University, University Park, Danielle Mitterrand Boulevard, Koya KOY45, Kurdistan Region.
Keywords: Gender identification, Kurdish media, Neural networks, Text mining

Abstract

Web-based Kurdish media have seen a tangible growth in the last few years. There are many factors that have contributed into this rapid growth. These include an easy access to the internet connection, the low price of electronic gadgets and pervasive usage of social networking. The swift development of the Kurdish web-based media imposes new challenges that need to be addressed. For example, a newspaper article published online possesses properties such as author name, gender, age, and nationality among others. Determining one or more of these properties, when ambiguity arises, using computers is an important open research area. In this study the journalist’s gender in web-based Kurdish media determined using computational linguistic and text mining techniques. 75 web-based Kurdish articles used to train artificial model designed to determine the gender of journalists in web-based Kurdish media. Articles were downloaded from four different well known web-based Kurdish newspapers. 61 features were extracted from each article; these features are distinct in discriminating between genders. The Multi-Layer Perceptron (MLP) artificial neural network is used as a classification technique and the accuracy received were 76%.

Downloads

Download data is not yet available.

Author Biographies

Peshawa J. Muhammad Ali, Department of Software Engineering, Faculty of Engineering, Koya University, University Park, Danielle Mitterrand Boulevard, Koya KOY45, Kurdistan Region.
Peshawa J. Muhammad Ali is a lecturer and researcher at the Department of Software Engineering, Koya University since 2006. He has B.Sc. in Civil Engineering and M.Sc. in Computer Science. His main research area is data mining and machine learning with several published articles in the field of neural networks. Mr. Peshawa is a Consultant Civil Engineer at the Kurdistan Engineering Union and he has an experience in this field.
Nigar M. Shafiq Surameery, Department of Software Engineering, Faculty of Engineering, Koya University, University Park, Danielle Mitterrand Boulevard, Koya KOY45, Kurdistan Region.
Department of Software Engineering, Faculty of Engineering.
Ladeh Sardar Abdulrahman, Department of Software Engineering, Faculty of Engineering, Koya University, University Park, Danielle Mitterrand Boulevard, Koya KOY45, Kurdistan Region.
Department of Software Engineering, Faculty of Engineering.

References

Burger, J., Henderson, J., Kim, G. and Zarrella, G., 2011. Discriminating gender on Twitter. In: Association for Computational Linguistics, In: Conference on empirical methods in natural language processing, 27-31 July 2011. Edinburgh, Scotland, UK.

Cheng, N., Chandramouli, R. and Subbalakshmi, K.P., 2011. Author gender identification from text. Digital Investigation, 8(1), pp.78-88.

Cheng, N., Chen, X., Chandramouli, R., and Subbalakshmi, K., 2009. Gender identification from e-mails. In: IEEE, IEEE Symposium on computational linguistics and data mining proceedings, 30-2 April 2009. Nashville, TN, USA.

Deitrick, W., Miller, Z., Valyou, B., Dickinson, B., Munson, T. and Hu, W., 2012. Gender identification on Twitter using the modified balanced winnow. Communications and Network, 4(3), pp.189-195.

Efron, R., and Thisted, B., 1976. Estimating the number of unseen species: How many words did Shakespeare know?. Biometrika, 63(3), pp.435-447.

Esugasini, S., Mashor, M., Isa, N. and Othman, N., 2005. Performance comparison for MLP networks using various backpropagation algorithms for breast cancer diagnosis. In: 9th International conference on knowledge-based intelligent information and engineering systems (KES'05), 14-16 September 2005. Australia.

Herdağdelen, A., 2013. Twitter n-gram corpus with demographic metadata, Language resources and evaluation, pp.1-21.

Labov, W., 1990. The intersection of sex and social class in the course of linguistic change. Language variation and change, 2, pp.205-254.

Lakoff, R., 1973. Language and women’s place. Language in society, 2(1), pp.45-80.

Merriam, T., 1996. Marlowe’s hand in Edward III revisited. Literary and linguistic computing, 11(1), pp.19-22.

Nguyen, T., Phung, D., Adams, B. and Venkatesh, S., 2011. Prediction of age, sentiment, and connectivity from social media text. In: WISE (Web Information System Engineering), In: 12th International conference on web information system engineering (WISE'11), 12-14 October 2011. Sydney, Australia.

Yunis, A.M., 2012. Towards an application programming interface (API) for processing Kurdish text. [pdf] Canada: Carlton University research group web-site, Available at:

<http://people.scs.carleton.ca/~armyunis/projects/KAPI/KAPI.pdf>

Published
2013-11-20
How to Cite
Muhammad Ali, P. J., Shafiq Surameery, N. M., Yunis, A.-R. M. and Abdulrahman, L. S. (2013) “Gender Prediction of Journalists from Writing Style”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 1(1), pp. 22-28. doi: 10.14500/aro.10031.
Section
Articles