A Comparative Study of Scalability and Performance in NoSQL Databases for Big Data Storage and Retrieval

Elena Popescu

Department of Computer Science, University of Timișoara, Romania

Andrei Radu

Department of Information Technology, University of Sibiu, Romania

Keywords: NoSQL Database, Document Store, Key-Value Store, Wide-Column Store, Graph Database, Scalable Database, Flexible Schema, High Availability


Abstract

As big data volumes explode, traditional relational databases are unable to meet the scalability, performance and availability demands of modern workloads. This has led to the rapid emergence of NoSQL databases designed specifically for big data's scale and throughput requirements. This paper provides a comprehensive comparative study between four popular NoSQL databases - MongoDB, Cassandra, HBase and Couchbase. Extensive benchmarking using Yahoo’s Cloud Serving Benchmark (YCSB) framework evaluates their scalability from 10GB to 1TB datasets and performance across read-heavy, write-heavy and mixed workloads.  Cassandra achieves the highest throughput at all dataset sizes with MongoDB second. Couchbase and HBase have lower throughput relative to their document model counterparts. For latency, Cassandra maintains sub-70ms even at 1TB scale while the other databases exhibit higher latencies as data volumes increase. Tests across read, write and mixed workloads show Cassandra with the lowest operation latency due to its column-oriented structure and caching mechanisms. MongoDB exhibits strong performance for read-heavy workloads but lags on write throughput. HBase and Couchbase lag the document databases in both performance and scalability. For availability, Couchbase and Cassandra are leaders with mature cross-datacenter replication. MongoDB and HBase have improved availability but trail in some enterprise features. Overall, Cassandra emerges as the top choice combining blazing write performance, linear scalability and robust availability needed for large-scale big data applications. The benchmarks provide insights for selecting the optimal NoSQL database based on data volumes, workload patterns and availability requirements.


Author Biography

Andrei Radu, Department of Information Technology, University of Sibiu, Romania