Big Data Benchmarking and Performance Optimization on Cloud Infrastructure
Alessandro Ricci
Department of Computer Science and Engineering, University of Bologna, Italy
Viviana Cortiana
University of Bologna
Keywords: Big Data, Cloud Computing, Benchmarking Tools, Performance Optimization, Benchmark Suites, Machine Learning and Deep Learning
Abstract
With the rapid growth of big data, cloud computing has emerged as an attractive solution for storing and processing large datasets. However, benchmarking and optimising the performance of big data systems on cloud infrastructure remains a key challenge. This paper provides a comprehensive review of big data benchmarking tools, performance optimisation techniques, and recent advances in this domain. We first introduce the unique characteristics of big data that necessitate new benchmarking approaches. We then present an overview of popular big data benchmark suites like TPCx-BB, YCSB, GridMix, BigBench, and BigData Bench. The capabilities, metrics, workloads, and limitations of these benchmarks are discussed. Next, we review different performance optimisation strategies for big data on the cloud, including resource provisioning, data placement, partitioning, compression, and query optimisation. The experimental results of applying these techniques to cloud platforms like Amazon AWS, Microsoft Azure, and Google Cloud are analysed. We also highlight research studies that employ machine learning and deep learning for automating and improving big data performance. Finally, we outline open challenges and future directions for big data benchmarking and optimisation on cloud infrastructure. With cloud adoption growing swiftly, this survey serves as a handy guide for researchers and practitioners aiming to efficiently evaluate and tune big data systems on the cloud.
Author Biographies
Alessandro Ricci, Department of Computer Science and Engineering, University of Bologna, Italy
Viviana Cortiana, University of Bologna