June 27, 2017

Creating Scalable Business Intelligence with Redshift, Stitch, Fivetran, and Looker

Title slide bg wide

Companies today strive to be more data-driven than ever before. There is a pervasive thought in business that data can be the differentiator between you and your competitors. If you collect the right data and leverage it effectively, you can create a competitive advantage that can bring strong and sustained growth to your business. However, the keys to creating a true data-driven company lie in the data infrastructure of your company.

In order for data to be effectively used to make decisions, it needs to be collected and stored in readily available systems. Furthermore, data in those systems must be cleaned and transformed into usable chunks that business owners can manipulate and visualize in order to gain insight. As your company continues to grow, those systems and processes that store and cleanse your data need to grow as well. Thankfully, the proliferation of cloud technologies has created an abundance of options for not only delivering data storage and transformation capability but also allowing companies to scale their data infrastructures as they grow.

Here at Contactually, we’ve built a robust data infrastructure on cloud technologies which have allowed us to improve our data quality as well as maintain the flexibility to introduce data from new services and integrate them seamlessly into our existing architecture. We’ll focus on the way we’ve designed our data infrastructure to give you an idea of how a fast-paced startup deals with delivering data insights.

Amazon Redshift — Cloud Storage with Analytics In Mind

AWS Redshift is a data warehousing solution that combines a scalable storage solution with high performance querying on structured data.

Amazon Web Services has become a major player in cloud computing and offering scalable solutions for low cost. We use Redshift as our primary data warehouse and replicate data from our application as well as our business systems.

As we’ve grown, we’ve had to resize our Redshift cluster to utilize more computing power in order to serve our data needs. Operations such as resizing a cluster takes only hours to a few days depending on the size of your warehouse. In our case, we had application data as well as data from all of our business systems, such as Salesforce and Zuora, and were still able to resize the cluster overnight with minimal disruption.

Data Replication At Scale

Stitch integration dashboard allowing the option to add new data sources

Fivetran integration dashboard allowing the option to add new data sources

With Redshift in place, we had a solid central repository for our data. Now, how does that data get into Redshift? There are a couple of options for replication services that make most integrations very simple. The two that we use are Fivetran and Stitch that allow easy replication from a multitude of cloud services. Connecting new data sources is as simple as authenticating an admin user for that specific system and choosing the tables that you want to replicate into Redshift.

What’s great about these integrations is that there is very little maintenance that is required from a data integrations team. Each service offers a dashboard with real-time updates about whether a service is having trouble with replication and you’re able to set the replication intervals right from that dashboard. The low maintenance cost allows you to focus your resources on innovating rather than maintaining.

“database plan” by tec_estromberg is licensed under CC BY 2.0

While those data replication services are simple and quick to set up, they do not currently offer integrations to every piece of software your company might use. At Contactually, we’ve supplemented our replication services with a custom Rails application that can serve data directly into Redshift. Some of the functions the Rails app helps out with includes:

  • Pushing data from Redshift and our app to Salesforce
  • Pushing data from Salesforce to Hubspot
  • Snapshotting data for analytics
  • Syncing GoToWebinar data

Of course, this requires development and maintenance time but it does allow us to pull in sources that we would otherwise be missing. For the most part, integrating other data sources has been simple since Redshift runs on a modified Postgres instance and many of the same Ruby tools that are available for Postgres work with Redshift.

Multi-Source Analytics At Scale

With all of the important data sources synced into Redshift, you still need to expose that data to the rest of the company. The tool we use for this is called Looker, which sits right on top of Redshift and provides another layer of abstraction to allow for calculated fields and aggregations to be made before finally exposing the data to internal teams. The tool also gives us the opportunity to define joins between data sources so everyone is aggregating and viewing the same data.

The most valuable part of this whole stack is the fact that we can use Looker to join across systems for reporting. For example, one of our reports shows information from our marketing system that collects data points about visitors to the site and marries this data with Salesforce and Product information. So you end up with a comprehensive funnel consisting of people who visited the site at the top of the funnel all of the way down to the converted users at the bottom.

By having all of these sources in one place, there has been a significant uptick in adoption of Looker by almost all of our departments. We’ve essentially broken the data silos that tend to occur when reporting is done out of source systems and created a holistic data environment where people can go into one place to find what they’re looking for.

What’s next?

So we have a robust Redshift instance that is being fed data from multiple systems. Maintenance on this whole stack is relatively low and we have enabled our business teams to view data across our company without too much intervention from our BI team. Now what?

Well, the answer is to move on to building out higher level data analysis capabilities. Currently, we are working on using our data and machine learning to answer more complicated questions as well as provide predictions for our business units. The time saved on developing and maintaining integrations as well as the self-serve nature of our Looker instance gives our BI team time to innovate. Hopefully, we can use resources and time saved to build out a robust machine learning infrastructure that can be used in conjunction with our current stack to help Contactually grow even quicker.

For millions of professionals, relationships are the backbone of a viable business. Whether you’re working with clients, prospects, or potential investors, Contactually helps you build stronger relationships with the people who can make you successful.

Contactually is looking for experienced software engineers who are passionate about solving complex problems with code. We’ve taken a novel approach to building business software — focus on the end user — and it’s been working! Our users love us. Come help us build a product that makes contact management easy and rescue 10,000s of people from the jaws of clunky, outdated software.

Creating Scalable Business Intelligence with Redshift, Stitch, Fivetran, and Looker was originally published in Contactually Product and Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.