Big Data Benchmarking and Performance Optimization on Cloud Infrastructure

Alessandro Ricci

Department of Computer Science and Engineering, University of Bologna, Italy

Viviana Cortiana

University of Bologna

Keywords: Big Data, Cloud Computing, Benchmarking Tools, Performance Optimization, Benchmark Suites, Machine Learning and Deep Learning


Abstract

With the rapid growth of big data, cloud computing has emerged as an attractive solution for storing and processing large datasets. However, benchmarking and optimising the performance of big data systems on cloud infrastructure remains a key challenge. This paper provides a comprehensive review of big data benchmarking tools, performance optimisation techniques, and recent advances in this domain. We first introduce the unique characteristics of big data that necessitate new benchmarking approaches. We then present an overview of popular big data benchmark suites like TPCx-BB, YCSB, GridMix, BigBench, and BigData Bench. The capabilities, metrics, workloads, and limitations of these benchmarks are discussed. Next, we review different performance optimisation strategies for big data on the cloud, including resource provisioning, data placement, partitioning, compression, and query optimisation. The experimental results of applying these techniques to cloud platforms like Amazon AWS, Microsoft Azure, and Google Cloud are analysed. We also highlight research studies that employ machine learning and deep learning for automating and improving big data performance. Finally, we outline open challenges and future directions for big data benchmarking and optimisation on cloud infrastructure. With cloud adoption growing swiftly, this survey serves as a handy guide for researchers and practitioners aiming to efficiently evaluate and tune big data systems on the cloud.


Author Biographies

Alessandro Ricci, Department of Computer Science and Engineering, University of Bologna, Italy

 

 

 

Viviana Cortiana, University of Bologna