MongoDB Vs MySQL – Benchmarks Re-Creating Typical CMS Functionality to Compare Performance of MySQL and MongoDB in PHP / Apache
5:21 am in Benchmarks, MongoDB, MySQL by Mark Smalley
Over the past few years there has been a gradual rise in prominence for a new breed of databases that is quickly gaining traction. NoSQL databases offer many advantages to developers over traditional Relational SQL databases, both in terms of development flexibility and the speed and instant scalability. The question that will be forefront in the mind of any developer working on a large project is “how does it perform?”. The bottom line for many projects will be “which database will run my application the best” – we at Laulima also had these questions, and after an extensive but not exhaustive search failed to find any definitive third-party statistics for the simple tasks.
Laulima is responsible for the technology and design of two high profile projects, http://www.travelblog.org/ is a PHP MySQL based travel CMS, and http://www.mongopress.org/ is a new MongoDB (NoSQL) PHP OpenSource CMS – so our immediate interest was in measuring the performance on these two specific databases. Both MySQL and mongoDB are open-source, are also both considered to be the standard of their classes.
And so in light of hosting the first-ever Kuala-Lumpur MongoDB User-Group, we wanted to present some easy to understand, realistic benchmarks for typical tasks that are usually associated with CMS platforms and how they might construct a page or interact with their users from a database-level.
Deciding on Typical CMS Usage Scenarios
With our experience developing TravelBlog, we have seen a single page require as many as 200 separate transactions to generate a page. This is admittedly a large number, and through the use of caching we have reduced this number down to a level where visitors rarely notice a pause. However, with a need to choose a benchmarking figure we selected this as our typical worst case “social-network” style page load. Analysing the typical number of reads Vs update operations, we opted for a ratio with 10 reads Vs 1 update.
Our benchmarks have used these numbers and ratios as the basis, they are not arbitrary but may differ from other less “heavy” CMS systems – we have also open-sourced our scripts and encourage those interested to download and adjust the parameters and server set-up as required. We are well aware that in specific usage scenarios, the results found could be very different.
Breaking Down Typical CMS Operations
NoSQL operates in a different way to SQL – where some of the operations that are fundamental to SQL are not the same in NoSQL. For instance; auto-incremented IDs and the fetching of random data do not make sense from a technical perspective in a NoSQL environment, and so the operations we simulated include:
- MySQL: Select – Update – Insert – Delete
- MongoDB: Find – FindAndModify – Save – Remove
We will randomly loop-through the PHP functions, meaning that certain functions will be called more than the others but the difference is is only in the region of 5%. We have taken high-profile website operations as the example, with lots of reads and select operations because the users and visitors are rarely inserting and updating data in comparison. We conducted the benchmarks using this operation in mind and both scripts have a query to run selects based on indexed column / fields, where no index but the index field will be run more than 5 consecutive times. All selects, updates and delete queries are based on random “lorem ipsum” data collected from an external text file.
Ensuring Fairness
We wrote our scripts in PHP, the language of nearly all our projects and 75% of all production websites (http://en.wikipedia.org/wiki/PHP#Usage). We used Apache/2.2.9 (Debian), PHP Version 5.2.6-1+lenny13 from Debian Lenny package manager, MongoDB 1.8.2, MySQL 5.0+ with MongoDB driver version 1.2.4 from pecl, MySQL client API driver version 5.0.51a and Apache Bench (ab) in order to conduct our benchmarks concurrently. We stripped down our scripts to use the most commonly used drivers for each database, did not include any additional third-party frameworks, and as such error handling was also minimized. These are scripts intended purely for determining performance, and should not be used as the basis of an application.
In both cases we were not only measuring the pure performance of the database, but were also looking at the overhead from PHP and Apache, but seeing as both set-ups are identical, and both are industry standards, we feel that the impact of this is fair on the tested databases. The differences in the performance of the scripts should be a result of the database alone and so to ensure this, each benchmark run was conducted on a clean boot-up, through the localhost address (so that network issues would not impact performance) and no PHP code optimizer (eaccelerator, apc, xcache) was installed. Both databases saved their data in XFS partitions to get the best performance and no applications other than those used for the benchmarks were in use.
Our Server
Our server utilised the following specifications:
- Linux Debian Lenny Kernel 2.6.26-2-amd64
- Intel(R) Core(TM) i5 CPU 661 @ 3.33GHz
- 4GB RAM
- HDD SATA Model Number: WDC WD5000AADS-00S9B0 (500GB)
This is a relatively new desktop workstation that we dual boot into a Linux server for cases like this. Unfortunately, we did not have the resources to test on a production level server (12 core, 16gb Linux servers in most cases) – but we anticipate that the levels of performance would hold true assuming that they were configured correctly.
The Results
This first graph shows MongoDB under-performing when given only a single task to do at a time:
But when MongoDB is given simultaneous connections and tasks, thinks become interesting:
Concluding Thoughts
The full logs for these benchmarks can be seen at – http://www.mongopress.org/presentations/benchmarks/benchmark_results_001.txt
So forgetting about other MongoDB golden-nuggets, such as Replica-Sets, Sharding, GridFS and Geo-Location, not to mention the inherently relevant way of storing data as documents whilst providing developers with more native ways to easily extend the database and in-turn allow for more rapid development; MongoDB clearly outperforms MySQL at even the simplest of tasks when given simultaneous connectivity, which is something every web-site and or web-based application should strive to need.
You may also be interested in How To: Import 7 Million Locations from GeoNames to MySQL, then into MongoDB so they can be compared and optimized for Location Queries.
Hey Mark,
Nice post on the difference between the two systems. Do you have any details on normal CRUD operations over differing size datasets? It’d be great to continue this research and see what the outcomes are – along with combining it with different performance tweaks and so on.
Matt