Engineering Blog

Engineering

Learn from our challenges and triumphs as our talented engineering team offers insights for discussion and sharing.

Running Hank on AWS

A key piece of LiveRamp’s data-distribution infrastructure has always been our open-source key-value data store Hank.   Hank powers a number of key applications at LiveRamp which require real-time data lookups. These include requests to our pixel servers for identifier syncing, client-side data distribution, and our real-time data pipelines.  Over the past few months, we ...

Upgrading Cloudera Hadoop

Why Upgrade?

LiveRamp has relied heavily on MapReduce for our big-data computation since 2009.  However, the Hadoop ecosystem has grown and matured dramatically over the past 5 years, and one of the big changes has been the shift from MapReduce-centric MRv1 to MRv2 -- Hadoop YARN. YARN separates the resource allocation layer of a Hadoop cluster ...

Identifying Maven snapshot artifacts by git revision

Our build environment at LiveRamp includes quite a few actively developed library projects, both open-source and internal.  Since engineers are often coordinating changes between libraries and downstream projects when adding or refactoring methods, it’s important to be able to quickly release new versions of these libraries.  To accomplish this we make heavy use of ...

Using PMD to blacklist unsafe methods

At LiveRamp, much of the code we write is built around the Hadoop ecosystem. While the tools in the ecosystem are very powerful, the APIs are evolving rapidly and many have "gotcha" methods which can cause serious bugs when misused. We've found that certain methods were especially common sources of frustration among new developers. ...

Automatic logging of MapReduce task failures

When using Cascading to run MapReduce jobs in production, the most common exception we find in our job logs look like this: Caused by: cascading.flow.FlowException: step failed: (1/1), with job id: job_201307251526_37599, please see cluster logs for failure messages at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:210) at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:145) at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:120) at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This exception tells us that the job failed ...

Debugging “ClassCastException: cascading.tap.hadoop.io.MultiInputSplit” exceptions when testing Cascading flows

When testing our Hadoop data workflows we've intermittently run into this error, which ends up failing the MapReduce job being tested: java.lang.ClassCastException: cascading.tap.hadoop.io.MultiInputSplit cannot be cast to org.apache.hadoop.mapred.FileSplit at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) A quick search for the error didn't find any obvious problems. When we dug into the problem a a bit more, we noticed a couple ...