Joining Petabytes of Data Per Day: How LiveRamp Powers its Matching Product

Our data matching service processes ~10 petabytes of data per day and generates ~1 petabyte of compressed output every day. It continuously utilizes ~25k CPU cores and consumes ~50 terabytes of RAM on our Hadoop clusters. These numbers are growing as more and more data flows through our platform.  How can we efficiently process the […]

Migrating a Big Data Environment to the Cloud, Part 5

What next? In the previous posts about our migration, we asked ourselves: Why do we want to move to the cloud? What do we want our day 1 architecture to look like? How do we get there? How do we handle our bandwidth constraints? The last and most exciting questions are, “What comes next”?  “How […]

Migrating a Big Data Environment to the Cloud, Part 3

How do we get there? In part 2 of this series we discussed what we wanted our cloud MVP to look like.  The next question was — how do we get there without turning the company off for a month? We started with what we knew we needed.  For at minimum a few months, our […]

Migrating a Big Data Environment to the Cloud, Part 4

Copying to the cloud LiveRamp is in the midst of a massive migration of all of our infrastructure to GCP.  In our previous posts we talked about our migration and our decision to use Google as our cloud.  In this post, I want to zoom in on one major problem we needed to solve to […]

Migrating a Big Data Environment to the Cloud, Part 2

Starting the journey Last post we discussed why we were migrating to the cloud, GCP in particular.  Once we knew we were migrating, we started by asking ourselves three questions: What will our cloud architecture look like on Day 1?  We know there’s a lot of exciting stuff we could do in the cloud — […]

Migrating a Big Data Environment to the Cloud, Part 1

LiveRamp is a big data company. A lot of companies have big data.  Robust logging frameworks can generate a PB of logs before breakfast and stash it away forever on S3, just in case. A few companies even use their big data.  They have a product, and then use Hadoop and Spark to do some […]