LiveRamp Hackweek: Serializing and Transmitting Bytecode Between JVMs

Engineering

Learn from our challenges and triumphs as our talented engineering team offers insights for discussion and sharing.

LiveRamp Hackweek: Serializing and Transmitting Bytecode Between JVMs

Engineering

LiveRamp has a long tradition of holding Hackweeks roughly once per quarter. During these weeks, our engineers are encouraged to work on whatever projects they find most interesting or most valuable that fall off the beaten path and may not get prioritized during our normal sprints. These projects range from deeply practical proof-of-concepts of new products (LiveRamp’s onboarding product started as a Hackweek project!), to improvements to office life, to experimentation in cutting edge new technologies or techniques.

While we obviously highly value those Hackweek projects which become a part of our product suite or technology stack, we also place a lot of value on projects that ultimately don’t work out. Hackweek is at its best when people are taking risks and trying new things, and we often learn a lot from projects that pushed our boundaries and made us consider problems from new angles.

In this post, I’d like to describe one of those projects: Remote Code. This project was ultimately never used at LiveRamp, but it inspired a lot of conversation and thought about the limits of service oriented architectures designed for reuse and composition.

Remote Code

What if we could build “higher-order” big-data applications in the same way we build higher-order functions? Functional programming languages get huge mileage out of composable, reusable functions such as map and reduce, where a function takes in behavior that corresponds to an interface as an argument and makes use of it without needing any changes or special casing to handle any possible implementation of that interface. Could we unlock new areas of application design by accepting code as the argument to a service?

Remote Code is an experimental project which gives us that capability on the JVM. By serializing not just the data of an object, but also the bytecode forming it’s implementation, we can transmit arbitrary implementations of a shared interface between JVMs.  

At a high level, the library works by serializing the data for an object alongside the bytecode definitions of every class that the code for that object ultimately depends on. On the remote side, we use these definitions to build a new classloader containing all of these definitions which is able to deserialize and execute our object. Finally, we use a Proxy to enable communication between the local classloader and this new “foreign” classloader and its object.

Serializing Bytecode

The first part of using Remote Code is to serialize some object which implements the interface we care about. Imagine we want to pass a ranking function to an application which searches a multi-terabyte dataset and returns the top 10 results. The interface for the ranking function might look like this:

public interface RankingFunction implements Serializable {
  public int rank(byte[] record);
}

Note that we want the implementations of this interface to be transmittable, so we go ahead and require them to be Serializable so that we don’t run into unexpected issues later.

Turning this into a transmittable object is easy – we pass it to a static constructor method on RemoteCodeObject to get back an instance of RemoteCodeObject, which can be serialized and sent to any JVM.

RankingFunction myFn = new MyRankingFunction();
RemoteCodeObject<RankingFunction> rco = 
  RemoteCodeObject.toRemoteCodeObject(myFn);

Inside this call is where a lot of the magic happens, so we’ll break down exactly what this method does.

Inside of a RemoteCodeObject are two important things: the serialized object we want to transmit, and a map of classnames to byte arrays representing the bytecode definitions of all classes our object relies on to operate.  We build this map following a recursive procedure where we use ClassLoader.getResourceAsStream to get the bytecode of the .class file that contains a particular class, add that bytecode to the map, and then use the javassist library’s CtClass.getRefClasses() method to find all classes referenced in that class. We perform a depth-first search of the tree of dependencies to find all the code necessary to use the object we ultimately want to transmit. We specifically filter out classes that we can expect to be shared (mostly JVM-provided classes like String, Integer etc.) to reduce the size of the object we send over the wire. The library allows the addition of ignored packages that are specific to your environment to further reduce serialized size.

The serialized object represents any state we need for the specific object instance we want to send, while this bytecode map gives us the actual definitions we’ll need in order to run our code on the other side.

Instantiating Foreign Objects

Once we’ve received the serialized RemoteCodeObject, we want to deserialize it in a way that:

  1. Allows us to use the object with the bytecode that was sent over for it
  2. Isolates that bytecode that our application relies on from the bytecode in the foreign object to prevent any unexpected behavior or crashes from using the wrong versions of things

For the user, RemoteCodeObject takes care of this for us. To get an object that Just Works ™, we deserialize the RemoteCodeObject via the normal methods, and then use toProxy(Class interfaceClass) to tell it which interface to respect.

byte[] serializedRCO = receiveOverWire(); // Get data from user somehow
RemoteCodeObject rco = 
  SerializationUtils.deserialize(serializedRCO);
RankingFunction foreignFunction = 
  rco.toProxy(RankingFunction.class);

Once again, internal to these calls there’s a lot of JVM magic to make things work. In this case,  ForeignBytecodeClassloader takes care of intercepting any calls load a class, and first checking if there’s a bytecode file in our transmitted map for that class that differs from the local definition of that class. If so, we use the defineClass method to define the class with our custom bytecode – otherwise, we delegate to the normal JVM classloader.

Once we’ve built this Classloader to respect our transmitted code, we use AlternateLoaderObjectInputStream to deserialize the object contained in the RemoteCodeObject. This InputStream makes sure to invoke our alternate loader and gives our foreign object the classes it needs to function.

Note that we can’t count on any particular class that is used by our new foreign object to be “the same” as the classes in our JVM – even a single byte difference in the bytecode means that the class will be loaded in our alternate classloader, and thus the JVM will regard the classes as distinct. This means we can’t directly cast our object to the interface we’re expecting – we have to use a Proxy instead. The proxy object we create does implement the interface we want, and it forwards methods reflectively to the foreign object.

This also makes it especially important that the interfaces we’re passing around communicate with classes that we know are going to be the same between the JVMs. Primitives and other classes included in every JRE are the safest bet.

Conclusion

At LiveRamp, we ultimately decided that Remote Code was a little too bleeding edge for our tastes – it opened up the possibility of bugs in transmitted code crashing other team’s applications, created applications that could only be used in environments of absolute trust, and add a layer of unknowns that outweighed the potential benefits. We felt the risks were a fair bit higher than the rewards for our code base.

That being said, we learned a lot from the project about JVM internals, and it has helped inform interface design to some extent from that point on. By identifying what we didn’t like about the model, we learned something about what was important to us about how our applications functioned and what level of uncertainty could be injected from the outside. Finally, it made for a really fun hackweek!