Get in touch

21 Januar 2021

Tuning application on GKE - Drupal with MySQL

How Jobiqo cares about their customers‘ performance and high-availability

 

Welcome to the sixth edition of our series on technology case studies from our portfolio companies.
Check out the firstsecondthird, fourth and fifth edition in case you have missed it!

In this post, Miro Michalicka from our portfolio company JOBIQO  will give you more insights into „Tuning application on GKE — Drupal with MySQL“.
This was Miro’s presentation at our last remote CTO meetup, you can find the video of his talk here.

This post was also posted on Medium.

Introduction to Jobiqo and why they have decided to go into GCP

Jobiqo is a provider of job board technology: we match applicants and companies on our web platform where companies publish job advertisements and applicants apply for them. Our technology stack is based on Drupal and PHP, where we leverage and contribute to a lot of open source components. In 2019 Jobiqo decided to join the Russmedia Equity Partners group to grow our business with the support of their knowledge and experience in similar business and technology areas. We are happy to work on our next generation hosting challenges with the Russmedia team!

Description of Jobiqo setup and first results

When migrating from on-premise to cloud solutions, we have to take into consideration a few important aspects of the new setup:

  • high-availability — if my application needs to be available all the time for the users
  • performance — how the performance numbers (i.e. latency) will look like vs on-premise setup
  • costs of the cloud setup

On-premise setup is very often not fully high-available, located on quite big servers or virtual machines, but for that reason very performance optimised. All components of the system like applications, caches and databases are located on the same physical machine, so latency is not an issue there. The challenge we would like to describe in this article is our road from on-premise, very performance optimised, but non high available solution to cloud setup with proper HA. You can always ask why do we even consider going to the cloud? The answer is quite simple: to provide our customers full HA and be able to deliver new features faster on Kubernetes.

We have to start our tuning with presenting you the initial setup on-premise:

Setup on-premise

We are hosting Drupal applications with MySQL as the main database while using Redis for caching, and Solr for searches.

Our initial setup in GCP:

Initial setup in GCP

Please note that to make a fully high available setup, we had to use an NFS-like system for user-uploaded files in Drupal. Google’s Filestore offering is not suitable for us because we don’t need to store 1TB of data which is the minimum for this service. Our colleagues from Russmedia Group recommended us GlusterFS, nicely described here: GlusterFS on GCP

Having both solutions presented, we need to compare the latency:

As you can see from the above, the HA setup in GCP is far away from our on-premise setup latency. We are going to show you our transformations with some guidance from Google Cloud Architect, Andrii Bereznikov.

Tuning application on GKE in phases

Our new cloud setup is based on Kubernetes (vs on-premise one located on Docker on a virtual machine). We are not going to describe how you can tune your Kubernetes cluster (you can find an interesting article here: tuning Kubernetes), but only our phases with modifications we have made as well as the latency measurements.

Phase I

Switch CloudSQL -> MySQL HA on GKE

 

One of the biggest benefits of going into the cloud is having a database as a service. With Google offering MySQL with master-slave replication and quick failover, this is the first setup most of the companies will start with.

Drupal applications are very ‘database-heavy’ and do a lot of reads and writes in single user page load. Even after removing SQLProxy sidecar and using a private database connection, we still had huge issues with latency. This was caused by the database as a service being located not on the same virtual machines as Kubernetes, so even a few ms latency multiplied by many calls is making a huge difference.

We have decided to try a multi-master Galera setup on GKE. We also evaluated Vitess.io (database used by YouTube service), but we have decided to stay with the solution we know.

This new setup allowed us to reduce the latency, while still having a fully high-available solution.

Phase I latency

The downside of it was that we no longer have a fully managed database, so before moving in this direction, you have to be sure that you can maintain the database on your own.

Phase II

Use full-service names

 

Although at first glance it might not look important, DNS can really impact the performance inside GKE clusters. Just by switching from service names to fully qualified service names in GKE (galera → galera.NAMESPACE.svc.cluster.local) we have saved around 80ms in the time to the first byte!

Phase II latency

Phase III

Choose proper virtual machines types

 

When you start moving to the cloud, you could choose one of two initial options:

  • use similar VM sizes to the ones you have on-premise

or

  • go with small VMs and test the performance

We have decided to start with n2-standard-2, so relatively small 2 CPUCMs. As hosting costs are a very important factor for us to stay competitive on the market, we prefer to scale the VM rather than pay for being idle most of the time. The issue we have found is that our application while communicating with the database, is using a lot of CPU.

We wouldn’t find these issues without using a proper profiler. Based on the results you might decide to rewrite part of your application which however wasn’t the case for us. Drupal is an open-source technology, so we need to maintain compatibility with the community version. For profiling, we have used NewRelic and based on the findings there, we have changed the VMs to c2-standard-4 (so very compute-optimized). This move gave us a huge boost.

Phase IV

Choose a proper disk size

 

What is not obvious from the beginning of your journey with the cloud is that the size of your SSD disks makes a huge impact on the performance. This is caused by the factor that read/write IOPS and throughput is dependent on the total size of the disk (disk performance). So even if you only need 10 GB of disk size for your data, the bigger disk you provide for your application or database, the more IOPS and throughput you get. Of course, you have to choose a proper balance, because your cloud cost will increase with bigger disks. For our GlusterFS solution, we have ended with 500GB disks and we reach great performance.

Phase IV latency

Summary

As you can see, it is possible to optimize cloud hosting to run monolithic applications such as Drupal. From our perspective, it’s also very important to have correct expectations and accept some losses in terms of performance but gain benefits such as scalability and reliability. We are already on-boarding our first customers to the cloud and you can look forward to reading our lessons learned as we progress.

If you would like to see more, please visit our Medium site or Youtube Channel.

In order to see the YouTube video, please activate the third-party-provider cookies in the cookie settings.

More from Russmedia Equity Partners

Webinar

Acquire your competitors to accelerate growth

25 Februar 2021

Russmedia Equity Partners (RMEP) invites you to join the webinar on how you can consolidate your market share with M&A activities. The online event that takes place on zoom on March 25th at 4 PM Central European Time is dedicated to all tech companies that are interested in learning the ins and outs of buying a competitor in the online marketplaces and SaaS industry.
Read more

Tech Case Study 4

Machine Learning and Elasticsearch empowering great marketplaces

09 Juni 2020

In this fourth edition of our regular series on technology case studies from our portfolio companies, RMC DevOps Consultant Filip Haftek and Russmedia Romania's Romina Popa will show you how they improved our customer satisfaction at Russmedia Digital Romania.
Read more

You’re next Let’s go