Engineering Blog

Engineering

Learn from our challenges and triumphs as our talented engineering team offers insights for discussion and sharing.

Hackweek XXXVIII Recap

  Hackweek is always a special time at LiveRamp, but an additional Thursday and Friday of hacking made this Hackweek 40% more special than ever before. A long form Hackweek meant LiveRampers could pursue projects with larger scale and greater complexity, from infrastructure improvements to analytical tools to investments in company culture. As Hackweek has matured into ...

Seeking Map-Side Join

At LiveRamp, many of our hadoop workflows join two datasets together (more datasets are supported but for the sake of simplicity the blog will cover the case of two datasets). In order to join two datasets efficiently, both have to be sorted, which happens in the Shuffle phase of MapReduce jobs. In general we ...

Debugging “ClassCastException: cascading.tap.hadoop.io.MultiInputSplit” exceptions when testing Cascading flows

When testing our Hadoop data workflows we've intermittently run into this error, which ends up failing the MapReduce job being tested: java.lang.ClassCastException: cascading.tap.hadoop.io.MultiInputSplit cannot be cast to org.apache.hadoop.mapred.FileSplit at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) A quick search for the error didn't find any obvious problems. When we dug into the problem a a bit more, we noticed a couple ...

Analyzing network load in Map/Reduce

Hadoop Map/Reduce can put a heavy toll on your network. Just how heavy, though, isn't obvious. This is an especially important consideration when you are expanding your cluster. LiveRamp recently encountered this situation, and in the process we devised a neat theoretical model for analyzing how network topology affects Map/Reduce. When does Hadoop put the ...