Gender Prediction of Journalists from Writing Style
AbstractWeb-based Kurdish media have seen a tangible growth in the last few years. There are many factors that have contributed into this rapid growth. These include an easy access to the internet connection, the low price of electronic gadgets and pervasive usage of social networking. The swift development of the Kurdish web-based media imposes new challenges that need to be addressed. For example, a newspaper article published online possesses properties such as author name, gender, age, and nationality among others. Determining one or more of these properties, when ambiguity arises, using computers is an important open research area. In this study the journalist’s gender in web-based Kurdish media determined using computational linguistic and text mining techniques. 75 web-based Kurdish articles used to train artificial model designed to determine the gender of journalists in web-based Kurdish media. Articles were downloaded from four different well known web-based Kurdish newspapers. 61 features were extracted from each article; these features are distinct in discriminating between genders. The Multi-Layer Perceptron (MLP) artificial neural network is used as a classification technique and the accuracy received were 76%.
Burger, J., Henderson, J., Kim, G. and Zarrella, G., 2011. Discriminating gender on Twitter. In: Association for Computational Linguistics, In: Conference on empirical methods in natural language processing, 27-31 July 2011. Edinburgh, Scotland, UK.
Cheng, N., Chandramouli, R. and Subbalakshmi, K.P., 2011. Author gender identification from text. Digital Investigation, 8(1), pp.78-88.
Cheng, N., Chen, X., Chandramouli, R., and Subbalakshmi, K., 2009. Gender identification from e-mails. In: IEEE, IEEE Symposium on computational linguistics and data mining proceedings, 30-2 April 2009. Nashville, TN, USA.
Deitrick, W., Miller, Z., Valyou, B., Dickinson, B., Munson, T. and Hu, W., 2012. Gender identification on Twitter using the modified balanced winnow. Communications and Network, 4(3), pp.189-195.
Efron, R., and Thisted, B., 1976. Estimating the number of unseen species: How many words did Shakespeare know?. Biometrika, 63(3), pp.435-447.
Esugasini, S., Mashor, M., Isa, N. and Othman, N., 2005. Performance comparison for MLP networks using various backpropagation algorithms for breast cancer diagnosis. In: 9th International conference on knowledge-based intelligent information and engineering systems (KES'05), 14-16 September 2005. Australia.
Herdağdelen, A., 2013. Twitter n-gram corpus with demographic metadata, Language resources and evaluation, pp.1-21.
Labov, W., 1990. The intersection of sex and social class in the course of linguistic change. Language variation and change, 2, pp.205-254.
Lakoff, R., 1973. Language and women’s place. Language in society, 2(1), pp.45-80.
Merriam, T., 1996. Marlowe’s hand in Edward III revisited. Literary and linguistic computing, 11(1), pp.19-22.
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S., 2011. Prediction of age, sentiment, and connectivity from social media text. In: WISE (Web Information System Engineering), In: 12th International conference on web information system engineering (WISE'11), 12-14 October 2011. Sydney, Australia.
Yunis, A.M., 2012. Towards an application programming interface (API) for processing Kurdish text. [pdf] Canada: Carlton University research group web-site, Available at:
Copyright (c) 2016 Peshawa J. Muhammad Ali, Nigar M. Shafiq Surameery, Abdul-Rahman Mawlood Yunis, Ladeh Sardar Abdulrahman
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0] that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).