Get in touch
21 Januar 2021
In this post, Miro Michalicka from our portfolio company JOBIQO will give you more insights into „Tuning application on GKE — Drupal with MySQL“.
This was Miro’s presentation at our last remote CTO meetup, you can find the video of his talk here.
This post was also posted on Medium.
Jobiqo is a provider of job board technology: we match applicants and companies on our web platform where companies publish job advertisements and applicants apply for them. Our technology stack is based on Drupal and PHP, where we leverage and contribute to a lot of open source components. In 2019 Jobiqo decided to join the Russmedia Equity Partners group to grow our business with the support of their knowledge and experience in similar business and technology areas. We are happy to work on our next generation hosting challenges with the Russmedia team!
When migrating from on-premise to cloud solutions, we have to take into consideration a few important aspects of the new setup:
On-premise setup is very often not fully high-available, located on quite big servers or virtual machines, but for that reason very performance optimised. All components of the system like applications, caches and databases are located on the same physical machine, so latency is not an issue there. The challenge we would like to describe in this article is our road from on-premise, very performance optimised, but non high available solution to cloud setup with proper HA. You can always ask why do we even consider going to the cloud? The answer is quite simple: to provide our customers full HA and be able to deliver new features faster on Kubernetes.
We have to start our tuning with presenting you the initial setup on-premise:
We are hosting Drupal applications with MySQL as the main database while using Redis for caching, and Solr for searches.
Our initial setup in GCP:
Please note that to make a fully high available setup, we had to use an NFS-like system for user-uploaded files in Drupal. Google’s Filestore offering is not suitable for us because we don’t need to store 1TB of data which is the minimum for this service. Our colleagues from Russmedia Group recommended us GlusterFS, nicely described here: GlusterFS on GCP.
Having both solutions presented, we need to compare the latency:
As you can see from the above, the HA setup in GCP is far away from our on-premise setup latency. We are going to show you our transformations with some guidance from Google Cloud Architect, Andrii Bereznikov.
Our new cloud setup is based on Kubernetes (vs on-premise one located on Docker on a virtual machine). We are not going to describe how you can tune your Kubernetes cluster (you can find an interesting article here: tuning Kubernetes), but only our phases with modifications we have made as well as the latency measurements.
One of the biggest benefits of going into the cloud is having a database as a service. With Google offering MySQL with master-slave replication and quick failover, this is the first setup most of the companies will start with.
Drupal applications are very ‘database-heavy’ and do a lot of reads and writes in single user page load. Even after removing SQLProxy sidecar and using a private database connection, we still had huge issues with latency. This was caused by the database as a service being located not on the same virtual machines as Kubernetes, so even a few ms latency multiplied by many calls is making a huge difference.
We have decided to try a multi-master Galera setup on GKE. We also evaluated Vitess.io (database used by YouTube service), but we have decided to stay with the solution we know.
This new setup allowed us to reduce the latency, while still having a fully high-available solution.
The downside of it was that we no longer have a fully managed database, so before moving in this direction, you have to be sure that you can maintain the database on your own.
Although at first glance it might not look important, DNS can really impact the performance inside GKE clusters. Just by switching from service names to fully qualified service names in GKE (galera → galera.NAMESPACE.svc.cluster.local) we have saved around 80ms in the time to the first byte!
When you start moving to the cloud, you could choose one of two initial options:
We have decided to start with n2-standard-2, so relatively small 2 CPUCMs. As hosting costs are a very important factor for us to stay competitive on the market, we prefer to scale the VM rather than pay for being idle most of the time. The issue we have found is that our application while communicating with the database, is using a lot of CPU.
We wouldn’t find these issues without using a proper profiler. Based on the results you might decide to rewrite part of your application which however wasn’t the case for us. Drupal is an open-source technology, so we need to maintain compatibility with the community version. For profiling, we have used NewRelic and based on the findings there, we have changed the VMs to c2-standard-4 (so very compute-optimized). This move gave us a huge boost.
What is not obvious from the beginning of your journey with the cloud is that the size of your SSD disks makes a huge impact on the performance. This is caused by the factor that read/write IOPS and throughput is dependent on the total size of the disk (disk performance). So even if you only need 10 GB of disk size for your data, the bigger disk you provide for your application or database, the more IOPS and throughput you get. Of course, you have to choose a proper balance, because your cloud cost will increase with bigger disks. For our GlusterFS solution, we have ended with 500GB disks and we reach great performance.
As you can see, it is possible to optimize cloud hosting to run monolithic applications such as Drupal. From our perspective, it’s also very important to have correct expectations and accept some losses in terms of performance but gain benefits such as scalability and reliability. We are already on-boarding our first customers to the cloud and you can look forward to reading our lessons learned as we progress.
In order to see the YouTube video, please activate the third-party-provider cookies in the cookie settings.
25 Februar 2021
Tech Case Study 4
09 Juni 2020
You’re next Let’s go