Gender Prediction of Journalists from Writing Style
Abstract
Web-based Kurdish media have seen a tangible growth in the last few years. There are many factors that have contributed into this rapid growth. These include an easy access to the internet connection, the low price of electronic gadgets and pervasive usage of social networking. The swift development of the Kurdish web-based media imposes new challenges that need to be addressed. For example, a newspaper article published online possesses properties such as author name, gender, age, and nationality among others. Determining one or more of these properties, when ambiguity arises, using computers is an important open research area. In this study the journalist’s gender in web-based Kurdish media determined using computational linguistic and text mining techniques. 75 web-based Kurdish articles used to train artificial model designed to determine the gender of journalists in web-based Kurdish media. Articles were downloaded from four different well known web-based Kurdish newspapers. 61 features were extracted from each article; these features are distinct in discriminating between genders. The Multi-Layer Perceptron (MLP) artificial neural network is used as a classification technique and the accuracy received were 76%.Downloads
References
Burger, J., Henderson, J., Kim, G. and Zarrella, G., 2011. Discriminating gender on Twitter. In: Association for Computational Linguistics, In: Conference on empirical methods in natural language processing, 27-31 July 2011. Edinburgh, Scotland, UK.
Cheng, N., Chandramouli, R. and Subbalakshmi, K.P., 2011. Author gender identification from text. Digital Investigation, 8(1), pp.78-88.
Cheng, N., Chen, X., Chandramouli, R., and Subbalakshmi, K., 2009. Gender identification from e-mails. In: IEEE, IEEE Symposium on computational linguistics and data mining proceedings, 30-2 April 2009. Nashville, TN, USA.
Deitrick, W., Miller, Z., Valyou, B., Dickinson, B., Munson, T. and Hu, W., 2012. Gender identification on Twitter using the modified balanced winnow. Communications and Network, 4(3), pp.189-195.
Efron, R., and Thisted, B., 1976. Estimating the number of unseen species: How many words did Shakespeare know?. Biometrika, 63(3), pp.435-447.
Esugasini, S., Mashor, M., Isa, N. and Othman, N., 2005. Performance comparison for MLP networks using various backpropagation algorithms for breast cancer diagnosis. In: 9th International conference on knowledge-based intelligent information and engineering systems (KES'05), 14-16 September 2005. Australia.
Herdağdelen, A., 2013. Twitter n-gram corpus with demographic metadata, Language resources and evaluation, pp.1-21.
Labov, W., 1990. The intersection of sex and social class in the course of linguistic change. Language variation and change, 2, pp.205-254.
Lakoff, R., 1973. Language and women’s place. Language in society, 2(1), pp.45-80.
Merriam, T., 1996. Marlowe’s hand in Edward III revisited. Literary and linguistic computing, 11(1), pp.19-22.
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S., 2011. Prediction of age, sentiment, and connectivity from social media text. In: WISE (Web Information System Engineering), In: 12th International conference on web information system engineering (WISE'11), 12-14 October 2011. Sydney, Australia.
Yunis, A.M., 2012. Towards an application programming interface (API) for processing Kurdish text. [pdf] Canada: Carlton University research group web-site, Available at:
<http://people.scs.carleton.ca/~armyunis/projects/KAPI/KAPI.pdf>
Copyright (c) 2013 Peshawa J. Muhammad Ali, Nigar M. Shafiq Surameery, Abdul-Rahman Mawlood Yunis, Ladeh Sardar Abdulrahman
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.