Data Analytics and Techniques

A Review

Keywords: Big Data Analysis, Data Analytics, Data Analysis, Data Management, Machine Learning

Abstract

Big data of different types, such as texts and images, are rapidly generated from the internet and other applications. Dealing with this data using traditional methods is not practical since it is available in various sizes, types, and processing speed requirements. Therefore, data analytics has become an important tool because only meaningful information is analyzed and extracted, which makes it essential for big data applications to analyze and extract useful information. This paper presents several innovative methods that use data analytics techniques to improve the analysis process and data management. Furthermore, this paper discusses how the revolution of data analytics based on artificial intelligence algorithms might provide improvements for many applications. In addition, critical challenges and research issues were provided based on published paper limitations to help researchers distinguish between various analytics techniques to develop highly consistent, logical, and information-rich analyses based on valuable features. Furthermore, the findings of this paper may be used to identify the best methods in each sector used in these publications, assist future researchers in their studies for more systematic and comprehensive analysis and identify areas for developing a unique or hybrid technique for data analysis.

Downloads

Download data is not yet available.

Author Biographies

Safa S. Abdul-Jabbar, Department of Computer Science, College of Science for Women, University of Baghdad, Baghdad, Iraq

Safa Sami Abdul-Jabbar is an Assistant Lecturer at the Department of Computer Science, University of Baghdad. She received her B.Sc and M.Sc. degrees in computer science from the College of Science for Women, University of Baghdad in 2009 and 2017, respectively. Her research interests include; data mining, data analytics, big data, security, and cloud computing.

Alaa k. Farhan, Department of Computer Science, University of Technology, Baghdad, Iraq

Alaa Kadhim Farhan a Professor at the Department of Computer Sciences, University of Technology, Baghdad. He received his B.Sc. degree in Computer Science and the M.Sc. degree in Information Security from the Department of Computer Sciences, University of Technology, Baghdad, Iraq, in 2003 and 2005, respectively. He got his PhD. degree in Information Security from the University of Technology, Baghdad, in 2009. He is a member of the IEEE. His research interests include; cryptography, programming languages, chaos theory, and cloud computing.

References

Abdul Majeed, G., Kadhim, A. and Subhi Ali, R. (2017). Retrieving encrypted query from encrypted database depending on symmetric encrypted cipher system method. Diyala Journal For Pure Science, 13(1), pp.183-207.

Abdul-jabbar, S.S. and George, L.E. (2017). Fast text analysis using symbol enumeration and Hashing methodology. Fast Strings Search Process, 58(1), pp.345-354.

Abkenar, S.B. Kashani, M.H., Mahdipour, E. and Jameii, S.M. (2020). Big data analytics meets social media: A systematic review of techniques, open issues, and future directions. Telematics and Informatics, 57, 101517.

Admiraal, M.M., Ramos, L.A., Delgado Olabarriaga, S., Marquering, H.A., Horn, J. and van Rootselaar, A.F. (2021). Quantitative analysis of EEG reactivity for neurological prognostication after cardiac arrest. Clinical Neurophysiology, 132(9), pp.2240-2247.

Apuke, O.D. (2017). Quantitative research methods : A synopsis approach. Kuwait Chapter of Arabian Journal of Business and Management Review, 6(11), pp.40-47.

Aung, K.Z. Myo, N.N. (2017). Sentiment Analysis of Students’Comment Using Lexicon Based Approach. IEEE/ACIS 16th International Conference on Computer and Information Science, pp.149-154.

Ben Seghier, M., Carvalho, H., Keshtegar, B. and Correia, J.A.F. (2020). Novel hybridized adaptive neuro-fuzzy inference system models based particle swarm optimization and genetic algorithms for accurate prediction of stress intensity factor. FFEMS, 43(11), pp.2653-2667.

Buselli, I., Oneto, L., Dambra, C., Gallego, C.V., Martínez, M.G., Smoker, A. and Martino, P.R. (2021). Natural Language Processing and Data-Driven Methods for Aviation Safety and Resilience : From Extant Knowledge to Potential Precursors. Open Research Europe.

Butcher, B. and Smith, B.J. (2020). Feature engineering and selection: Apractical approach for predictive models. The American Statistician, 74(3), pp.308-309.

Castellanos, C., Pérez, B., Varela, C.A. and Correal, D. (2020). AModel-Driven Architectural Design Method for Big Data Analytics Applications. Proceedings 2020 IEEE International Conference on Software Architecture Companion, ICSA-C 2020, pp.89-94.

Cearley, D.W., Natis, Y., Walker, M. and Burke, B. (2018). Top 10 Strategic Technology Trends for 2018. Gartner, Stamford. Available fromt: https://www.gartner.com/ngw/globalassets/en/information-technology/documents/top-10-strategic-technology-trends-for-2018.pdf [Last accessed on 2017 Oct 3].

Chen, M., Mao, S. and Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), pp.171-209.

Chen, X.W. and Lin, X. (2014). Big data deep learning: Challenges and perspectives. IEEE Access, 2, pp.514-525.

Choi, B.C., Cho, S. and Kim, C.W. (2018). Kriging Model Based Optimization of MacPherson Strut Suspension for Minimizing Side Load using Flexible Multi-Body Dynamics. International Journal of Precision Engineering and Manufacturing, 19(6), pp. 873-879.

Corizzo, R., Ceci, M. and Malerba, D. (2019). Big Data Analytics and Predictive Modeling Approaches for the Energy Sector. 2019 IEEE International Congress on Big Data (BigDataCongress), pp.55-63.

De Fortuny, E.J., Martens, D. and Provost, F. (2013). Predictive modeling with big data: Is bigger really better? Big Data, 1(4), pp.215-226.

Dean, J. (2014). Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners. John Wiley and Sons, Hoboken. Available from: https://doc.lagout.org/Others/Data Mining/Big Data, Data Mining, and Machine Learning_Value Creation for Business Leaders and Practitioners %5BDean 2014-05-27%5D.pdf. [Last accessed on 2022 Apr 01].

Do Nascimento, I.J.B., Marcolino, MS., Abdulazeem, H.M, Weerasekara, I., Azzopardi-Muscat, N., Gonçalves, M.A. and Novillo-Ortiz D. (2021). Impact of big data analytics on people’s health: Overview of systematic reviews and recommendations for future studies. Journal of Medical Internet Research, 23(4), p.e27275.

Dy, J.G. and Brodley, C.E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, pp. 848-889.

Elghazel, H. and Aussem, A. (2010). Feature selection for unsupervised learning using random cluster ensembles. Proceedings IEEE International Conference on Data Mining ICDM, pp.168-175. Doi: 10.1109/ICDM.2010.137.

Faizan, M. Zuhairi, M.F., Ismail, S. and Sultan, S. (2020). Applications of Clustering Techniques in Data Mining: A Comparative Study. International Journal of Advanced Computer Science and Applications, 11(12), pp.146-153.

Fan, C., Xiao, F., Li, Z. and Wang, J. (2018). Unsupervised data analytics in mining big building operational data for energy efficiency enhancement: A review. Energy and Buildings, 159, pp.296-308.

Farhan, K.A. and Ali, M.A. (2017). Database Protection System Depend on Modified Hash Function. 2nd International Conference of Cihan University-Erbil on Communication Engineering and Computer Science, p.2520-4777.

Ghavami P. (2020). Big Data Analytics Methods. 2nd ed. De Gruyter, Berlin. Hamarashid, H.K., Saeed, S.A. and Rashid, T.A. (2021). Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji. Neural Computing and Applications, 33(9), pp. 4547-4566.

Harfouchi, F., Habbi, H., Ozturk, C. and Karaboga, D. (2017). Modified multiple search cooperative foraging strategy for improved artificial bee colony optimization with robustness analysis. Soft Computing A Fusion of Foundations Methodologies and Applications, 22(19), pp.6371-6394.

Hendrycks, D. Carlini, N., Schulman, J., Steinhardt, J. (2021) Unsolved Problems in ML Safety. ArXiv, Cornell Tech, pp.1-28. Available from: https://arxiv.org/abs/2109.13916

Hong, Z., Smart, G., Dawood, M., Kaita, K., Wen, S.W., Gomes, J. and Wu, J. (2008). Hepatitis C Infection and Survivals of Liver Transplant Patients in Canada, 1997-2003. Transplantation Proceedings, 40(5), pp.1466-1470.

Hryshchenko, O. and Yaremenko, V. (2021). Acomparative analysis of text data classification accuracy and speed using neural networks, Bloom filter and naive Bayes. Technology Audit and Production Reserves, 5(2(61), pp.6-8.

Jain, V. (2017). Perspective analysis of telecommunication fraud detection using data stream analytics and neural network classification based data mining. International Journal of Information Technology, 9(3), p.1-8.

Jaouadi, Z., Abbas, T., Morgenthal, G. and Lahemer, T. (2020). Single and multi-objective shape optimization of streamlined bridge decks. Structural and Multidisciplinary Optimization, 61(4), pp.1495-1514.

Kan, M.Y. and Klavans, J.L., (2002). Using Librarian Techniques in Automatic Text Summarization for Information Retrieval. Proceedings of the ACM International Conference on Digital Libraries, Wuhan, pp.36-45.

Kashyap, R., (2019). Big data analytics challenges and solutions. In: Big Data Analytics for Intelligent Healthcare Management, Academic Press, Cambridge, pp.19-41.

Khoshbakht, F., Shiranzaei, A. and Quadri, S.M.K., (2021). Role of the big data analytic framework in business intelligence and its impact : Need and benefits. Turkish Journal of Computer and Mathematics Education, 12(10), pp.560-566.

Kumar, D.U., Soon, T.K., Saad, M., Idna, I.M.Y., Mehdi, S. and Bend, H. (2018). Forecasting of photovoltaic power generation and model optimization : Areview. Renewable and Sustainable Energy Reviews, 81, pp.912-928.

Kumar, J., Singh, A.K. and Buyya, R. (2020). Ensemble learning based predictive framework for virtual machine resource request prediction. Neurocomputing, 397, p.20-30.

Li, Z., Zhang, Z., Shi, J. and Wu, D., (2019). Prediction of surface roughness in extrusion-based additive manufacturing with machine learning. Robotics and Computer Integrated Manufacturing, 57, pp.488-495.

Liu, X., Liu, X., Zhu, Z. and Hu, L., (2020). An efficient multi-objective optimization method based on the adaptive approximation model of the radial basis function. Structural and Multidisciplinary Optimization, 63(4), p.1-19.

Luijken, K., Groenwold, R.H.H., Van Calster, B.E.W., Steyerberg, E.W. and Van Smeden, M. (2019). Impact of predictor measurement heterogeneity across settings on the performance of prediction models: A measurement error perspective. Statistics in Medicine, 38(18), pp.3444-3459.

Luijken, K., Song, J. and Groenwold, R.H.H., (2022). Quantitative prediction error analysis to investigate predictive performance under predictor measurement heterogeneity at model implementation. Diagnostic and Prognostic Research, 1, pp.1-11.

Luijken, K., Wynants, L., van Smeden, M., Van Calster, B., Steyerberg, E.W. and Groenwold, R.H.H., (2020). Changing predictor measurement procedures affected the performance of prediction models in clinical examples. Journal of Clinical Epidemiology, 119, pp.7-18.

Mariani, M. and Baggio, R. (2022). Big data and analytics in hospitality and tourism: A systematic literature review. International Journal of Contemporary Hospitality Management, 34(1), pp.231-278.

McNamara, D.S., Allen, L.K., Crossley, S.A., Dascalu,M. and Perret, C.A., (2017). Natural language processing and learning analytics. In: Handbook of Learning Analytics. Ch. 8. Society for Learning Analytics Research, Alberta, pp.93-104.

Mishra, R. and Sharma, R. (2014). Big data opportunities and challenges: Discussions from data analytics perspectives. International Journal of Computer Science and Mobile Computing, 46(6), pp.27-35.

Moharram, A., Altamimi, S. and Alshammari, R., (2021). Data Analytics and Predictive Modeling for Appointments No-show at a Tertiary Care Hospital. 2021 1st International Conference on Artificial Intelligence and Data Analytics, CAIDA 2021, pp.275-277.

Nag, A.K. and Mitra, A., (2002). Forecasting daily foreign exchange rates using genetically optimized neural networks. Journal of Forecasting, 21(7), pp.501-511.

Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R. and Muharemagic, E., (2015). Deep learning applications and challenges in big data analytics. Journal of Big Data, 2(1), pp.1-21.

Pajouheshnia, R., van Smeden, M., Peelen, L.M., Groenwold, R.H.H., (2019). How variation in predictor measurement affects the discriminative ability and transportability of a prediction model. Journal of Clinical Epidemiology, 105, pp.136-141.

Patel, A., Singh, N.M. and Kazi, F. (2017). Internet of Things and Big Data Technologies for Next Generation Healthcare. Springer Cham, Berlin. Pourhomayoun, M., Alshurafa, N., Mortazavi, B., Ghasemzadeh, H., Sideris, K., Sadeghi, B., Ong, M., Evangelista, L., Romano, P., Auerbach, A., Kimchi, A. and Sarrafzadeh, M. (2014). Multiple model analytics for adverse event prediction in remote health monitoring systems. In: 2014 IEEE Healthcare Innovation Conference, HIC 2014, pp.106-110.

Rajaraman, V., (2016). Big data analytics. Resonance, 21(8), pp.695-716. Rajon, S.A.A., Shamim, A. and Arif, M., (2011). A Generic Framework for Implementing Electronic Commerce in Developing Countries. International Journal of Computer and Information Technology, 1(2).

Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M.P. and Na, A.Y., (2017). CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. ArXiv.

Rajput, A., (2019). Natural language processing, sentiment analysis, and clinical analytics. In: Innovation in Health Informatics: A Smart Healthcare Primer, Academic Press, USA, pp.79-97.

Rashid, T.A., Mustafa, A.M. and Saeed, A.M. (2018). Automatic kurdish text classification using KDC 4007 dataset. In: Advances in Internetworking, Data and Web Technologies, Barolli, L., Zhang, M. and Wang, Z., editors. Lecture Notes on Data Engineering and Communications Technologies. Vol. 6, Springer, Cham, Berlin, pp.187-198.

Rocha, J.L.M., Zela, M.A.C., Torres, N.I.V. and Medina, G.S. (2021). Analogy of the application of clustering and K-means techniques for the approximation of values of human development indicators. International Journal of Advanced Computer Science and Applications, 12(9), pp.526-532.

Russell, S. and Norvig, P. (2020). Artificial Intelligence a Modern Approach. 4th ed. Prentice Hall, Hoboken. Saini, N., Saha, S., Chakraborty, D. and Bhattacharyya, B., (2019). Extractive single document summarization using binary differential evolution optimization of different sentence quality measures. PLoS One, 14(11), p.e0223477. [Last accessed on 2022 Apr 01].

Salloum, S., Huang, J.Z., He, Y. and Chen, X. (2018). An asymptotic ensemble learning framework for big data analysis. IEEE Access, 7(c), pp.3675-3693.

Sanchez-Gomez, J.M., Vega-Rodriguez, M.A. and C., Perez, C.J. (2019). An Indicator-based Multi-objective optimizationapproach applied to extractive multi-documenttext summarization. IEEE Latin America Transactions, 17(8), pp.1291-1299.

Sanchez-Gomez, J.M., Vega-Rodríguez, M.A. and Pérez, C.J. (2018). Extractive multidocument text summarization using a multiobjective artificial bee colony optimization approach. Knowledge-Based Systems, 159, pp.1-8.

Schwarz, C., Schwarz, A. and Black, W.C., (2014). Tutorial: Big data analytics: Concepts, technologies, and applications. Communications of the Association for Information Systems, 34(1), pp.1191-1208.

Shamsaldin, A., Rashid, T.A., Fattah, P. and Al-Salihi, N.K., (2019). A study of the convolutional neural networks applications. UKH Journal of Science and Engineering, 3(2), pp.31-40.

Shouval, R., Bondi, O., Mishan, H., Shimoni, A., Unger, R. and Nagler, A., (2014). Application of machine learning algorithms for clinical predictive modeling: A data-mining approach in SCT. Bone Marrow Transplantation, 49(3), pp.332-337.

Smith, M., Szongott, C., Henne, B. and von Voigt, G., (2013). Big Data Privacy Issues in Public Social Media. IEEE International Conference on Digital Ecosystems and Technologies [Preprint].

Talasila, V., Madhubabu, K., Mahadasyam, M.C., Atchala, N.J. and Kande, L.S., (2020). The prediction of diseases using rough set theory with recurrent neural network in big data analytics. International Journal of Intelligent Engineering and Systems, 13(5), pp.10-18.

Tran, N.H., Bao, W., Zomaya, A., Nguyen Minh, N.H. and Hong, C.S., (2019). Federated Learning over Wireless Networks: Optimization Model Design and Analysis. Proceedings-IEEE INFOCOM, 2019-April(1), pp.1387-1395.

Vargas-Calderón, V., Ochoa, A.M., Nieto, G.Y.C. and Camargo, J.E., (2021). Machine learning for assessing quality of service in the hospitality sector based on customer reviews. Information Technology and Tourism, 23(3), pp.351-379.

Verma, J.P. and Agrawal, S., Patel, P. and Patel, A., (2016). Big data analytics: challenges and applications for text, audio, video, and social media data. International Journal on Soft ComputingArtificial Intelligence and Applications, 5(1), pp.41-51.

Vu, T., Belussi, A., Migliorini, S. and Eldway, A., (2021). Using deep learning for big spatial data partitioning. ACM Transactions on Spatial Algorithms and Systems (TSAS), 7(1), p.1-37.

Wang, J., Tang, Y., Nguyen, M. and Altintas, I., (2014). AScalable Data Science Workflow Approach for Big Data Bayesian Network Learning. Proceedings of the 2014 International Symposium on Big Data Computing, BDC 2014, pp.16-25.

Wang, X.D., Hirsch, C., Kang, S. and Lacor, C., (2011). Multi-objective optimization of turbomachinery using improved NSGA-II and approximation model. Computer Methods in Applied Mechanics and Engineering, 200(9-12), pp.883-895.

Yadav, N. and Chatterjee, N., (2016). Text Summarization using Sentiment Analysis for DUC Data. International Conference on Information Technology, pp.5.

Yaremenko, V.S., Rogoza, W.S. and Spitkovskyi, V.I., (2021). Application of neural network algorithms and naive bayes for text classification. Journal of Yu, C.H. Lee, H.S., Lara, E. and Gan, S., (2018). The ensemble and model comparison approaches for big data analytics in social sciences. Practical Assessment Research and Evaluation, 23(17).

Zaghloul, M., Salem, M. and Ali-Eldin, A., (2021). A new framework based on features modeling and ensemble learning to predict query performance. PLoS One, 16(10), pp.1-18.

Zarchi, M. and Attaran, B., (2019). Improved design of an active landing gear for a passenger aircraft using multi-objective optimization technique. Structural and Multidisciplinary Optimization, 59(5), pp.1813-1833.

Zheng, L. and Guo, L., (2020). Application of big data technology in insurance innovation. Journal of Physics Conference Series, 1682(1), pp.285-294.

Zhong, W., Yu, N. and Ai, C., (2020). Applying big data based deep learning system to intrusion detection. Big Data Mining and Analytices, 3(3), pp.181-195.

Zou, X., et al. (2019). Logistic Regression Model Optimization and Case Analysis, Proceedings of IEEE 7th International Conference on Computer Science and Network Technology, ICCSNT 2019, pp.135-139.

Published
2022-10-08
How to Cite
Abdul-Jabbar, S. S. and k. Farhan, A. (2022) “Data Analytics and Techniques: A Review”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 10(2), pp. 45-55. doi: 10.14500/aro.10975.
Section
Review Articles