Marketers today often gauge the quality of an identity vendor based on match rates, and accurate match rates are largely determined by whether an identity graph is deterministic or probabilistic. In a previous post, we covered the need to move beyond the overly simplistic and misunderstood framing of deterministic versus probabilistic and realize that both methodologies have their place in identity matching use cases. However, we believe an identity graph’s foundation must be deterministic in order to execute people-based marketing, and it is important to understand how a vendor defines deterministic and applies this science to its graph.
Even if a vendor says their graph is deterministic, that’s simply not enough to confirm its quality. In fact, there is significant risk in taking vendors at their word, as the cost of inaccurate data matching is significant. In reality, a mature and sophisticated deterministic graph should be validated to determine accuracy rate.
Identity Conflicts: How Data Validation Makes or Breaks a Deterministic Graph
Data validation is foundational to ensuring the highest levels of accuracy within an identity graph. While there is no one-size-fits-all approach to validation, a deterministic identity solution that relies on raw, unfiltered matching creates a host of identity conflicts.
Let’s walk through some common scenarios:
- Signups with fake emails, such as [email protected], cause false cookie-to-email matches that may be repeated thousands or millions of times, putting all those devices together deterministically; Oracle’s analysis found a single email was linked to 2.3 million cookies
- A friend who signs in to a website on your browser with his email causes a false cookie-to-email match, which could result in the friend’s devices becoming tied to yours and even your devices becoming tied to members of the friend’s family and their individual networks
- One email being associated with a shared Netflix or Spotify account, even though a group of friends and/or family access these services across their different personal devices
- Online order shipments to the homes of friends or family can create incorrect directly identifiable personal data linkages, even in a deterministic graph
When identity vendors link deterministic data together for recognition purposes, the false linkages above can aggregate into new links that are all incorrect. The effect of these inaccurate deterministic linkages compounds quickly. If a data onboarding or identity vendor decides to deliver matches using these linkages without any quality controls, the vast majority of the matched devices may be inaccurate. The reach they provide is meaningless. To ensure they are reaching the right audiences, marketers need to demand that deterministic graphs are mature and filtered for accuracy—not raw and unfiltered.
How Leading Deterministic Identity Graphs Deliver Accurate Matches
A leading identity vendor does not naively take deterministic matches at face value. A best-in-class deterministic graph should encompass the following measures to ensure accurate matching:
Knowledge Base of directly identifiable personal data Linkages
The graph should be built with an expansive directly identifiable personal data reference base to allow linking of directly identifiable personal data touchpoints and devices to a persistent representation of a person. This linking should rely predominantly on directly observed linkages, meaning that devices are only linked when they are directly observed using the directly identifiable personal data tied to a consumer. Non-directly identifiable personal data-based linkages do not need to be curated because they can be inaccurate.
Leading deterministic graphs prioritize accuracy and recency of identity over scale. In environments where there may not be enough data to build a fully deterministic graph, probabilistic models based on deterministic links still outperform graphs that are not built on a deterministic, directly identifiable personal data-based reference base.
People-Based Identifiers
The graph should have a varied and unbiased collection of source partners, with data that serves as a corroboration check against each other. The graph’s persistent identifier should be based on verified people-based data with machine learning to weigh the value of these different sources. It should also allow corroboration of incoming offline linkages like address and phone number, with verified people-based IDs.
Accuracy Filters
To stay most current, the graph should periodically filter out linkages that have not been validated for an extended period, as well as problematic data sources that cause linkages that are not corroborated by other sources. It should also compare incoming emails with a larger match network to remove problematic ones.
When a vendor uses these measures to filter the raw feed of deterministic linkages coming in, they are effectively converting iron ore into stainless steel.
The Power of Accurate Deterministic Matching
Our belief is that deterministic matches alone do not guarantee quality. A viable and accurate identity graph must have live deterministic match evidence as a starting point. Marketers who demand a higher level of quality and accuracy from their deterministic graphs will be better positioned to deliver relevant experiences across channels.