Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness

Keywords: Heterogenouis dataset, Bitcoin transactions, Bitcoin tweets


While Big Data analytics can provide a variety of benefits, processing heterogeneous data comes with its own set of limitations. A transaction pattern must be studied independently while working with Bitcoin data, this study examines twitter data related to Bitcoin and investigate communications pattern on bitcoin transactional tweet. Using the hashtags #Bitcoin or #BTC on Twitter, a vast amount of data was gathered, which was mined to uncover a pattern that everyone either (speculators, teaches, or the stakeholders) uses on Twitter to discuss Bitcoin transactions. This aim is to determine the direction of Bitcoin transaction tweets based on historical data. As a result, this research proposes using Big Data analytics to track Bitcoin transaction communications in tweets in order to discover a pattern. Hadoop platform MapReduce was used. The finding indicate that In the map step of the procedure, Hadoop's tokenize the dataset and parse them to the mapper where thirteen patterns were established and reduced to three patterns using the attributes previously stored data in the Hadoop context, one of which is the Emoji data that was left out in previous research discussions, but the text is only one piece of the puzzle on bitcoin transaction interaction, and the key part of it is “No certainty, only possibilities” in Bitcoin transactions


Download data is not yet available.

Author Biography

Nahla Aljojo, Department of Information system and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia

Nahla ALJOJO obtained her PhD in Computing at Portsmouth University. She is currently working as Associate Professor at College of Computer Science and Engineering, Information system and information Technology Department, University of Jeddah, Jeddah, Saudi Arabia. Her research interests include: adaptivity in web-based educational systems, eBusiness, leadership’s studies, information security and data integrity, eLearning, education, machine learning, Deep Learning, Networking health informatics, environment and ecology, and logistics and supply chain management. Her contributions have been published in prestigious peer-reviewed journals.


Abubakar, A., El-Gammal M.T. and Alarood, A.A., 2020. End-to-end fullyinformed network nodes associated with 433 MHz outdoor propagation environment. International Journal of Computing and Digital Systems, 10, pp.1-19.

Alkatheeri, Y., Ameen, A., Isaac, O., Nusari, M., Duraisamy, B. and Khalifa, G.S., 2020. The effect of big data on the quality of decision-making in Abu Dhabi Government organisations. In: Data Management, Analytics and Innovation, Springer, Singapore, pp.231-248.

Blumberg, R. and Atre, S., 2003. The problem with unstructured data. Dm Review, 13(42-49), p.62.

Bridges, D., Pitiot, A., MacAskill, M.R. and Peirce, J.W., 2020. The timing mega-study: Comparing a range of experiment generators, both lab-based and online. PeerJ, 8, p.e9414.

Cappa, F., Oriani, R., Peruffo, E. and McCarthy, I., 2021. Big data for creating and capturing value in the digitalized environment: unpacking the effects of volume, variety, and veracity on firm performance. Journal of Product Innovation Management, 38(1), pp.49-67.

Casado, R. and Younas, M., 2015. Emerging trends and technologies in big data processing. Concurrency and Computation: Practice and Experience, 27(8), pp.2078-2091.

Christophides, V., Efthymiou, V., Palpanas, T., Papadakis, G. and Stefanidis, K., 2020. An overview of end-to-end entity resolution for big data. ACM Computing Surveys, 53(6), pp.1-42.

Dey, N., Das, H., Naik, B. and Behera, H.S., 2019. Big Data Analytics for Intelligent Healthcare Management, Academic Press, Cambridge, Massachusetts. Dutta, A., Kumar, S. and Basu, M., 2020. A gated recurrent unit approach to bitcoin price prediction. Journal of Risk and Financial Management, 13(2), p.23.

Dwyer, G.P., 2015. The economics of Bitcoin and similar private digital currencies. Journal of Financial Stability, 17, p.81-91.

Gandomi, A. and Haider, M., 2015. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35, pp.137-144.

George, R. and Kabir, R., 2012. Heterogeneity in business groups and the corporate diversification firm performance relationship. Journal of Business Research, 65()3, pp.412-420.

Grover, P., Kar, A.K., Janssen, M. and Ilavarasan, P.V., 2019. Perceived usefulness, ease of use and user acceptance of blockchain technology for

digital transactions insights from user-generated content on Twitter. Enterprise Information Systems, 13(6), pp.771-800.

Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A. and Khan, S.U., 2015. The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 47, pp.98-115.

Hu, H., Wen, Y., Chua, T.S. and Li, X., 2014. Toward scalable systems for big data analytics: A technology tutorial. IEEE Access, 2, pp.652-687.

Kamps, J. and Kleinberg, B., 2018. To the moon: defining and detecting cryptocurrency pump-and-dumps. Crime Science, 7(1), pp.1-18.

Kaushik, S., 2021. Bitcoin Tweets, Tweets with trending #Bitcoin and #btc hashtag. Available from: https://www.kaggle.com/kaushiksuresh147/bitcointweets [Last accessed on 2021 May].

Kazemi, I. and Hassanzadeh, F., 2020. Modelling multivariate, overdispersed count data with correlated and non-normal heterogeneity effects. Statistics and Operations Research Transactions, 1, pp.335-356.

Kikuchi, S., Kitao, S. and Mikoshiba, M., 2020. Who suffers from the COVID-19 shocks? Labor market heterogeneity and welfare consequences in Japan. Covid Economics, 40, pp.76-114.

Krithika, D.R. and Rohini, K., 2020. Blockchain with bigdata analytics. In: Intelligent Computing and Innovation on Data Science, Springer, Singapore, pp.403-409.

Kumar, A., Abhishek, K., Nerurkar, P., Khosravi, M.R., Ghalib, M.R. and Shankar, A., 2021. Big data analytics to identify illegal activities on bitcoin blockchain for IoMT. Personal and Ubiquitous Computing, 1, pp.1-12.

Lahmiri, S. and Bekiros, S., 2020. Big data analytics using multi-fractal wavelet leaders in high-frequency Bitcoin markets. Chaos, Solitons and Fractals, 131, p.109472.

Lee, A.D., Li, M. and Zheng, H., 2020. Bitcoin: Speculative asset or innovative technology? Journal of International Financial Markets, 67, p.101209.

Lugli, E., Roederer, M. and Cossarizza, A., 2010. Data analysis in flow cytometry: The future just started. Cytometry Part A, 77(7), pp.705-713.

Malik, A., Burney, A. and Ahmed, F., 2020. Acomparative study of unstructured data with SQL and NO-SQL database management systems. Journal of Computer and Communications, 8(4), pp.59-71.

Mattke, J., Maier, C., Reis, L. and Weitzel, T., 2021. Bitcoin investment: Amixed methods study of investment motivations. European Journal of Information Systems, 30(3), pp.261-285.

Pano, T. and Kashef, R., 2020. A complete VADER-based sentiment analysis of bitcoin (BTC) tweets during the era of COVID-19. Big Data and Cognitive Computing, 4(4), p.33.

Schulze, P., Unger, B., Beattie, C. and Gugercin, S., 2018. Data-driven structured realization. Linear Algebra and its Applications, 537, pp.250-286.

Sean, B., 2021. Twitter Hits 199 Million Users, Reports “Solid” Q1 Revenue. Available from: https://www.thewrap.com/twitter-hits-199-million-users-reportssolid-q1-revenue [Last accessed on 2021 May].

Shankhdhar, A., Singh, A.K., Naugraiya, S. and Saini, P.K., 2021. Bitcoin price alert and prediction system using various models. In: IOP Conference Series: Materials Science and Engineering. Vol. 1131. IOP Publishing, p.012009.

Thelwall, M., Buckley, K. and Paltoglou, G., 2011. Sentiment in twitter events. Journal of the American Society for Information Science and Technology, 62(2), pp.406-418.

Urrutia, A.L., González-González, C., Van Cauwelaert, E.M., Rosell, J.A., Barrios, L.G. and Benítez, M., 2020. Landscape heterogeneity of peasantmanaged agricultural matrices. Agriculture, Ecosystems and Environment, 292, p.106797.

Vaduva, C., Iapaolo, M. and Datcu, M., 2020. A Scientific Perspective on Big Data in Earth Observation. In: Principles of Data Science, Springer, Cham, pp.155-188.

Yue, L., Tian, D., Chen, W., Han, X. and Yin, M., 2020. Deep learning for heterogeneous medical data analysis. World Wide Web, 23(5), pp.2715-2737.

Yue, X., Shu, X., Zhu, X., Du, X., Yu, Z., Papadopoulos, D. and Liu, S., 2018. Bitextract: Interactive visualization for extracting bitcoin exchange intelligence. IEEE Transactions on Visualization and Computer Graphics, 25(1), pp.162-171.

How to Cite
Aljojo, N. (2021) “Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 9(2), pp. 30-37. doi: 10.14500/aro.10857.