Data Onboarding System Overview


Learn from our challenges and triumphs as our talented engineering team offers insights for discussion and sharing.

Data Onboarding System Overview


LiveRamp brings offline data online. Our customers send us large “offline” datasets of user records for us to deliver anonymized versions of these records to an “online” destination, such as an ad network or data management platform. By “online”, we mean that the data record is anonymized and associated with a browser or device, enabling the customer to run ad campaigns that retarget their audience, or to measure offline conversions in response to an online campaign.

While an impressive amount of sophisticated technology has been developed to enable this service, the overall system can be understood in terms of three core functional components that mirror the basic workflow of data through the system.


A customer sends us a file of user records which we need to turn into segments in our system. By “segmentation”, I refer to the entire process of reading the file, importing the data, constructing segments based on the values in the file, and enabling the customer to define dynamic segments derived from the base data.


Customer records are keyed by some sort of identifier, such as an email address, postal address, or geographical code. Matching is the process by which we associate a customer record with an appropriate set of keys that could be used to match that record into the online ecosystem. For example, we may match a customer record to an entity in our system that corresponds to a handful of email and postal addresses; those in turn will be matched to online identifiers. We offer our customers the flexibility to balance precision and reach; for a customer who can tolerate lower precision in order to achieve greater reach, we may match a record based either on an individual match, and household-level match, or even a geographical area match.


Distribution refers to the transferral of customer segments to online destinations like ad networks and data management platforms (DMPs). There are two primary mechanisms for data distribution: client-side and server-side. Client-side transferral means that segment data is communicated via the browser or device, and requires a pixel fire on the given device. Because pixel fires are a limited resource, prioritization of distribution requests must be done intelligently. Server-side transferral allows for segment data to be communicated directly from our servers to the servers of the destination service. Server-side transferral does not completely eliminate the need for a client pixel fire, so optimized prioritization is still necessary.

Reporting, Security, Privacy, and More

While segmentation, matching, and distribution represent the core functional areas of the onboarding infrastructure, many additional pieces are necessary for the complete service. Log processing, data summarization, and clean user interfaces are all required for providing intuitive reporting to our customers and partners. Ensuring a secure environment for our customers’ and partners’ data, and managing a clean privacy-centric architecture are fundamental to our operations. And, of course, operating at “internet scale”, handling billions of daily requests and minimizing latencies over large geographies requires sophisticated and well-managed infrastructure.