Data Analytics and Techniques
A Review
Abstract
Big data of different types, such as texts and images, are rapidly generated from the internet and other applications. Dealing with this data using traditional methods is not practical since it is available in various sizes, types, and processing speed requirements. Therefore, data analytics has become an important tool because only meaningful information is analyzed and extracted, which makes it essential for big data applications to analyze and extract useful information. This paper presents several innovative methods that use data analytics techniques to improve the analysis process and data management. Furthermore, this paper discusses how the revolution of data analytics based on artificial intelligence algorithms might provide improvements for many applications. In addition, critical challenges and research issues were provided based on published paper limitations to help researchers distinguish between various analytics techniques to develop highly consistent, logical, and informationrich analyses based on valuable features. Furthermore, the findings of this paper may be used to identify the best methods in each sector used in these publications, assist future researchers in their studies for more systematic and comprehensive analysis and identify areas for developing a unique or hybrid technique for data analysis.
Downloads
References
Abdul Majeed, G., Kadhim, A. and Subhi Ali, R. (2017). Retrieving encrypted query from encrypted database depending on symmetric encrypted cipher system method. Diyala Journal For Pure Science, 13(1), pp.183207.
Abduljabbar, S.S. and George, L.E. (2017). Fast text analysis using symbol enumeration and Hashing methodology. Fast Strings Search Process, 58(1), pp.345354.
Abkenar, S.B. Kashani, M.H., Mahdipour, E. and Jameii, S.M. (2020). Big data analytics meets social media: A systematic review of techniques, open issues, and future directions. Telematics and Informatics, 57, 101517.
Admiraal, M.M., Ramos, L.A., Delgado Olabarriaga, S., Marquering, H.A., Horn, J. and van Rootselaar, A.F. (2021). Quantitative analysis of EEG reactivity for neurological prognostication after cardiac arrest. Clinical Neurophysiology, 132(9), pp.22402247.
Apuke, O.D. (2017). Quantitative research methods : A synopsis approach. Kuwait Chapter of Arabian Journal of Business and Management Review, 6(11), pp.4047.
Aung, K.Z. Myo, N.N. (2017). Sentiment Analysis of Students’Comment Using Lexicon Based Approach. IEEE/ACIS 16th International Conference on Computer and Information Science, pp.149154.
Ben Seghier, M., Carvalho, H., Keshtegar, B. and Correia, J.A.F. (2020). Novel hybridized adaptive neurofuzzy inference system models based particle swarm optimization and genetic algorithms for accurate prediction of stress intensity factor. FFEMS, 43(11), pp.26532667.
Buselli, I., Oneto, L., Dambra, C., Gallego, C.V., Martínez, M.G., Smoker, A. and Martino, P.R. (2021). Natural Language Processing and DataDriven Methods for Aviation Safety and Resilience : From Extant Knowledge to Potential Precursors. Open Research Europe.
Butcher, B. and Smith, B.J. (2020). Feature engineering and selection: Apractical approach for predictive models. The American Statistician, 74(3), pp.308309.
Castellanos, C., Pérez, B., Varela, C.A. and Correal, D. (2020). AModelDriven Architectural Design Method for Big Data Analytics Applications. Proceedings 2020 IEEE International Conference on Software Architecture Companion, ICSAC 2020, pp.8994.
Cearley, D.W., Natis, Y., Walker, M. and Burke, B. (2018). Top 10 Strategic Technology Trends for 2018. Gartner, Stamford. Available fromt: https://www.gartner.com/ngw/globalassets/en/informationtechnology/documents/top10strategictechnologytrendsfor2018.pdf [Last accessed on 2017 Oct 3].
Chen, M., Mao, S. and Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), pp.171209.
Chen, X.W. and Lin, X. (2014). Big data deep learning: Challenges and perspectives. IEEE Access, 2, pp.514525.
Choi, B.C., Cho, S. and Kim, C.W. (2018). Kriging Model Based Optimization of MacPherson Strut Suspension for Minimizing Side Load using Flexible MultiBody Dynamics. International Journal of Precision Engineering and Manufacturing, 19(6), pp. 873879.
Corizzo, R., Ceci, M. and Malerba, D. (2019). Big Data Analytics and Predictive Modeling Approaches for the Energy Sector. 2019 IEEE International Congress on Big Data (BigDataCongress), pp.5563.
De Fortuny, E.J., Martens, D. and Provost, F. (2013). Predictive modeling with big data: Is bigger really better? Big Data, 1(4), pp.215226.
Dean, J. (2014). Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners. John Wiley and Sons, Hoboken. Available from: https://doc.lagout.org/Others/Data Mining/Big Data, Data Mining, and Machine Learning_Value Creation for Business Leaders and Practitioners %5BDean 20140527%5D.pdf. [Last accessed on 2022 Apr 01].
Do Nascimento, I.J.B., Marcolino, MS., Abdulazeem, H.M, Weerasekara, I., AzzopardiMuscat, N., Gonçalves, M.A. and NovilloOrtiz D. (2021). Impact of big data analytics on people’s health: Overview of systematic reviews and recommendations for future studies. Journal of Medical Internet Research, 23(4), p.e27275.
Dy, J.G. and Brodley, C.E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, pp. 848889.
Elghazel, H. and Aussem, A. (2010). Feature selection for unsupervised learning using random cluster ensembles. Proceedings IEEE International Conference on Data Mining ICDM, pp.168175. Doi: 10.1109/ICDM.2010.137.
Faizan, M. Zuhairi, M.F., Ismail, S. and Sultan, S. (2020). Applications of Clustering Techniques in Data Mining: A Comparative Study. International Journal of Advanced Computer Science and Applications, 11(12), pp.146153.
Fan, C., Xiao, F., Li, Z. and Wang, J. (2018). Unsupervised data analytics in mining big building operational data for energy efficiency enhancement: A review. Energy and Buildings, 159, pp.296308.
Farhan, K.A. and Ali, M.A. (2017). Database Protection System Depend on Modified Hash Function. 2nd International Conference of Cihan UniversityErbil on Communication Engineering and Computer Science, p.25204777.
Ghavami P. (2020). Big Data Analytics Methods. 2nd ed. De Gruyter, Berlin. Hamarashid, H.K., Saeed, S.A. and Rashid, T.A. (2021). Next word prediction based on the Ngram model for Kurdish Sorani and Kurmanji. Neural Computing and Applications, 33(9), pp. 45474566.
Harfouchi, F., Habbi, H., Ozturk, C. and Karaboga, D. (2017). Modified multiple search cooperative foraging strategy for improved artificial bee colony optimization with robustness analysis. Soft Computing A Fusion of Foundations Methodologies and Applications, 22(19), pp.63716394.
Hendrycks, D. Carlini, N., Schulman, J., Steinhardt, J. (2021) Unsolved Problems in ML Safety. ArXiv, Cornell Tech, pp.128. Available from: https://arxiv.org/abs/2109.13916
Hong, Z., Smart, G., Dawood, M., Kaita, K., Wen, S.W., Gomes, J. and Wu, J. (2008). Hepatitis C Infection and Survivals of Liver Transplant Patients in Canada, 19972003. Transplantation Proceedings, 40(5), pp.14661470.
Hryshchenko, O. and Yaremenko, V. (2021). Acomparative analysis of text data classification accuracy and speed using neural networks, Bloom filter and naive Bayes. Technology Audit and Production Reserves, 5(2(61), pp.68.
Jain, V. (2017). Perspective analysis of telecommunication fraud detection using data stream analytics and neural network classification based data mining. International Journal of Information Technology, 9(3), p.18.
Jaouadi, Z., Abbas, T., Morgenthal, G. and Lahemer, T. (2020). Single and multiobjective shape optimization of streamlined bridge decks. Structural and Multidisciplinary Optimization, 61(4), pp.14951514.
Kan, M.Y. and Klavans, J.L., (2002). Using Librarian Techniques in Automatic Text Summarization for Information Retrieval. Proceedings of the ACM International Conference on Digital Libraries, Wuhan, pp.3645.
Kashyap, R., (2019). Big data analytics challenges and solutions. In: Big Data Analytics for Intelligent Healthcare Management, Academic Press, Cambridge, pp.1941.
Khoshbakht, F., Shiranzaei, A. and Quadri, S.M.K., (2021). Role of the big data analytic framework in business intelligence and its impact : Need and benefits. Turkish Journal of Computer and Mathematics Education, 12(10), pp.560566.
Kumar, D.U., Soon, T.K., Saad, M., Idna, I.M.Y., Mehdi, S. and Bend, H. (2018). Forecasting of photovoltaic power generation and model optimization : Areview. Renewable and Sustainable Energy Reviews, 81, pp.912928.
Kumar, J., Singh, A.K. and Buyya, R. (2020). Ensemble learning based predictive framework for virtual machine resource request prediction. Neurocomputing, 397, p.2030.
Li, Z., Zhang, Z., Shi, J. and Wu, D., (2019). Prediction of surface roughness in extrusionbased additive manufacturing with machine learning. Robotics and Computer Integrated Manufacturing, 57, pp.488495.
Liu, X., Liu, X., Zhu, Z. and Hu, L., (2020). An efficient multiobjective optimization method based on the adaptive approximation model of the radial basis function. Structural and Multidisciplinary Optimization, 63(4), p.119.
Luijken, K., Groenwold, R.H.H., Van Calster, B.E.W., Steyerberg, E.W. and Van Smeden, M. (2019). Impact of predictor measurement heterogeneity across settings on the performance of prediction models: A measurement error perspective. Statistics in Medicine, 38(18), pp.34443459.
Luijken, K., Song, J. and Groenwold, R.H.H., (2022). Quantitative prediction error analysis to investigate predictive performance under predictor measurement heterogeneity at model implementation. Diagnostic and Prognostic Research, 1, pp.111.
Luijken, K., Wynants, L., van Smeden, M., Van Calster, B., Steyerberg, E.W. and Groenwold, R.H.H., (2020). Changing predictor measurement procedures affected the performance of prediction models in clinical examples. Journal of Clinical Epidemiology, 119, pp.718.
Mariani, M. and Baggio, R. (2022). Big data and analytics in hospitality and tourism: A systematic literature review. International Journal of Contemporary Hospitality Management, 34(1), pp.231278.
McNamara, D.S., Allen, L.K., Crossley, S.A., Dascalu,M. and Perret, C.A., (2017). Natural language processing and learning analytics. In: Handbook of Learning Analytics. Ch. 8. Society for Learning Analytics Research, Alberta, pp.93104.
Mishra, R. and Sharma, R. (2014). Big data opportunities and challenges: Discussions from data analytics perspectives. International Journal of Computer Science and Mobile Computing, 46(6), pp.2735.
Moharram, A., Altamimi, S. and Alshammari, R., (2021). Data Analytics and Predictive Modeling for Appointments Noshow at a Tertiary Care Hospital. 2021 1st International Conference on Artificial Intelligence and Data Analytics, CAIDA 2021, pp.275277.
Nag, A.K. and Mitra, A., (2002). Forecasting daily foreign exchange rates using genetically optimized neural networks. Journal of Forecasting, 21(7), pp.501511.
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R. and Muharemagic, E., (2015). Deep learning applications and challenges in big data analytics. Journal of Big Data, 2(1), pp.121.
Pajouheshnia, R., van Smeden, M., Peelen, L.M., Groenwold, R.H.H., (2019). How variation in predictor measurement affects the discriminative ability and transportability of a prediction model. Journal of Clinical Epidemiology, 105, pp.136141.
Patel, A., Singh, N.M. and Kazi, F. (2017). Internet of Things and Big Data Technologies for Next Generation Healthcare. Springer Cham, Berlin. Pourhomayoun, M., Alshurafa, N., Mortazavi, B., Ghasemzadeh, H., Sideris, K., Sadeghi, B., Ong, M., Evangelista, L., Romano, P., Auerbach, A., Kimchi, A. and Sarrafzadeh, M. (2014). Multiple model analytics for adverse event prediction in remote health monitoring systems. In: 2014 IEEE Healthcare Innovation Conference, HIC 2014, pp.106110.
Rajaraman, V., (2016). Big data analytics. Resonance, 21(8), pp.695716. Rajon, S.A.A., Shamim, A. and Arif, M., (2011). A Generic Framework for Implementing Electronic Commerce in Developing Countries. International Journal of Computer and Information Technology, 1(2).
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M.P. and Na, A.Y., (2017). CheXNet: RadiologistLevel Pneumonia Detection on Chest XRays with Deep Learning. ArXiv.
Rajput, A., (2019). Natural language processing, sentiment analysis, and clinical analytics. In: Innovation in Health Informatics: A Smart Healthcare Primer, Academic Press, USA, pp.7997.
Rashid, T.A., Mustafa, A.M. and Saeed, A.M. (2018). Automatic kurdish text classification using KDC 4007 dataset. In: Advances in Internetworking, Data and Web Technologies, Barolli, L., Zhang, M. and Wang, Z., editors. Lecture Notes on Data Engineering and Communications Technologies. Vol. 6, Springer, Cham, Berlin, pp.187198.
Rocha, J.L.M., Zela, M.A.C., Torres, N.I.V. and Medina, G.S. (2021). Analogy of the application of clustering and Kmeans techniques for the approximation of values of human development indicators. International Journal of Advanced Computer Science and Applications, 12(9), pp.526532.
Russell, S. and Norvig, P. (2020). Artificial Intelligence a Modern Approach. 4th ed. Prentice Hall, Hoboken. Saini, N., Saha, S., Chakraborty, D. and Bhattacharyya, B., (2019). Extractive single document summarization using binary differential evolution optimization of different sentence quality measures. PLoS One, 14(11), p.e0223477. [Last accessed on 2022 Apr 01].
Salloum, S., Huang, J.Z., He, Y. and Chen, X. (2018). An asymptotic ensemble learning framework for big data analysis. IEEE Access, 7(c), pp.36753693.
SanchezGomez, J.M., VegaRodriguez, M.A. and C., Perez, C.J. (2019). An Indicatorbased Multiobjective optimizationapproach applied to extractive multidocumenttext summarization. IEEE Latin America Transactions, 17(8), pp.12911299.
SanchezGomez, J.M., VegaRodríguez, M.A. and Pérez, C.J. (2018). Extractive multidocument text summarization using a multiobjective artificial bee colony optimization approach. KnowledgeBased Systems, 159, pp.18.
Schwarz, C., Schwarz, A. and Black, W.C., (2014). Tutorial: Big data analytics: Concepts, technologies, and applications. Communications of the Association for Information Systems, 34(1), pp.11911208.
Shamsaldin, A., Rashid, T.A., Fattah, P. and AlSalihi, N.K., (2019). A study of the convolutional neural networks applications. UKH Journal of Science and Engineering, 3(2), pp.3140.
Shouval, R., Bondi, O., Mishan, H., Shimoni, A., Unger, R. and Nagler, A., (2014). Application of machine learning algorithms for clinical predictive modeling: A datamining approach in SCT. Bone Marrow Transplantation, 49(3), pp.332337.
Smith, M., Szongott, C., Henne, B. and von Voigt, G., (2013). Big Data Privacy Issues in Public Social Media. IEEE International Conference on Digital Ecosystems and Technologies [Preprint].
Talasila, V., Madhubabu, K., Mahadasyam, M.C., Atchala, N.J. and Kande, L.S., (2020). The prediction of diseases using rough set theory with recurrent neural network in big data analytics. International Journal of Intelligent Engineering and Systems, 13(5), pp.1018.
Tran, N.H., Bao, W., Zomaya, A., Nguyen Minh, N.H. and Hong, C.S., (2019). Federated Learning over Wireless Networks: Optimization Model Design and Analysis. ProceedingsIEEE INFOCOM, 2019April(1), pp.13871395.
VargasCalderón, V., Ochoa, A.M., Nieto, G.Y.C. and Camargo, J.E., (2021). Machine learning for assessing quality of service in the hospitality sector based on customer reviews. Information Technology and Tourism, 23(3), pp.351379.
Verma, J.P. and Agrawal, S., Patel, P. and Patel, A., (2016). Big data analytics: challenges and applications for text, audio, video, and social media data. International Journal on Soft ComputingArtificial Intelligence and Applications, 5(1), pp.4151.
Vu, T., Belussi, A., Migliorini, S. and Eldway, A., (2021). Using deep learning for big spatial data partitioning. ACM Transactions on Spatial Algorithms and Systems (TSAS), 7(1), p.137.
Wang, J., Tang, Y., Nguyen, M. and Altintas, I., (2014). AScalable Data Science Workflow Approach for Big Data Bayesian Network Learning. Proceedings of the 2014 International Symposium on Big Data Computing, BDC 2014, pp.1625.
Wang, X.D., Hirsch, C., Kang, S. and Lacor, C., (2011). Multiobjective optimization of turbomachinery using improved NSGAII and approximation model. Computer Methods in Applied Mechanics and Engineering, 200(912), pp.883895.
Yadav, N. and Chatterjee, N., (2016). Text Summarization using Sentiment Analysis for DUC Data. International Conference on Information Technology, pp.5.
Yaremenko, V.S., Rogoza, W.S. and Spitkovskyi, V.I., (2021). Application of neural network algorithms and naive bayes for text classification. Journal of Yu, C.H. Lee, H.S., Lara, E. and Gan, S., (2018). The ensemble and model comparison approaches for big data analytics in social sciences. Practical Assessment Research and Evaluation, 23(17).
Zaghloul, M., Salem, M. and AliEldin, A., (2021). A new framework based on features modeling and ensemble learning to predict query performance. PLoS One, 16(10), pp.118.
Zarchi, M. and Attaran, B., (2019). Improved design of an active landing gear for a passenger aircraft using multiobjective optimization technique. Structural and Multidisciplinary Optimization, 59(5), pp.18131833.
Zheng, L. and Guo, L., (2020). Application of big data technology in insurance innovation. Journal of Physics Conference Series, 1682(1), pp.285294.
Zhong, W., Yu, N. and Ai, C., (2020). Applying big data based deep learning system to intrusion detection. Big Data Mining and Analytices, 3(3), pp.181195.
Zou, X., et al. (2019). Logistic Regression Model Optimization and Case Analysis, Proceedings of IEEE 7th International Conference on Computer Science and Network Technology, ICCSNT 2019, pp.135139.
Copyright (c) 2022 Safa S. AbdulJabbar, Alaa k. Farhan
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:

Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BYNCSA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors have the freedom to enter into separate agreements for the nonexclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.

Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.