June 19, 2017
When query, schema and index optimizations aren’t enough to support the load of a high throughput database system, fundamental changes in how data flows through that system are often necessary.
Over the past few years, our team at Contactually has been fortunate enough to tackle numerous database performance issues, ranging from autovacuum tuning to full production database upgrades. Growth as a company doesn’t occur without these types of issues and we’re happy to share our experiences with our peers.
Load balancing is a fundamental concept in any system design. It’s important to design your data systems to handle the peaks or spikes in usage, as opposed to designing to handle the average load a system experience.
By splitting queries between your primary and follower databases, you’re able to ensure that users interacting with your web application have a snappy and performant experience, while saving heavy background jobs for a follower.
There are a few options when it comes to choosing a database adapter that supports primary/follower query balancing. After evaluating Makara and octopus, we decided to go with Makara for the following reasons:
Given the importance of a highly available database, we had to be very careful in how we rolled out these changes to production. The key take aways were to:
Use a database config that could easily be reverted using ENV['USE_MAKARA’] in case we quickly needed to roll back our changes and use the default postgresql adapter.
<% if ENV["USE_MAKARA"] %>
- role: master
url: <%= ENV["DATABASE_URL"] %>
- name: replica
url: <%= ENV["REPLICA_DATABASE_URL"] %>
<% else %>
url: <%= ENV["USE_DATABASE_URL"] %>
<% end %>
We ran into an issue with our first deployment where we needed to immediately roll back. Users’ access to Contactually was unaffected and we were able to quickly address the problem and re-deploy. You can see some more suggestions from this gist here.
Pick a non-critical, background job to introduce distribute_reads on. We picked our job that generate merge suggestions for our users. If the jobs failed for whatever reason, we could roll back, and just run them again.
# your regular application logic
Ensure you have proper monitoring in place. We use NewRelic and Amazon RDS metrics to see the effect of our changes. If you can’t monitor results, you have no way of knowing how effective your changes were.
Effective monitoring helped us quickly (within seconds) determine that our initial rollout had an issue that needed to be quickly fixed.
If you’re introducing a follower database that has never been used or queried before, it’s important to make sure the database is warmed up before hitting it with lots of complex queries.
Just like a car in cold weather, you wouldn’t want to start your engine and slam on the gas before the engine has had a chance to warm up.
If the follower instances were recently created, initiate simple count queries for your most recently active users to warm both the database and operating system level caches. For more information, check out pg_prewarm.
Results so far have been a staggering improvement in system-wide database performance improvement. We saw a ~30% decrease in time spent in the database for our web-initiated queries and an almost total elimination of spikes in Queue Depth.
If you or your team have implemented a system like this, we’d love to hear your feedback and share what we’ve learned in greater depth!
For millions of professionals, relationships are the backbone of a viable business. Whether you’re working with clients, prospects, or potential investors, Contactually helps you build stronger relationships with the people who can make you successful.
Contactually is looking for experienced software engineers who are passionate about solving complex problems with code. We’ve taken a novel approach to building business software — focus on the end user — and it’s been working! Our users love us. Come help us build a product that makes contact management easy and rescue 10,000s of people from the jaws of clunky, outdated software.
Adding Read Replicas in a Production Ruby on Rails System with Zero Downtime Using Makara was originally published in Contactually Product and Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.