Data Science

The Next Generation of Privacy Technology

April 20, 2021  |   Randall G.

Listen now!

Transcript:

Recently, LiveRamp acquired DataFleets, furthering our ability to  keep useful data private, and private data useful. 

In this episode of Saying the Quiet Part Out Loud, DataFleets CEO and Co-founder David Gilmore, who is now LiveRamp’s Head of Privacy Technology Solutions, joins Anneka to dig into DataFleets’ cutting-edge technology and how it provides enterprises with more flexible options for privacy and distributed data collaboration. 

Anneka:
Welcome to, Saying the Quiet Part Out Loud, a podcast from LiveRamp that uncovers what’s unsaid about technology, data, and business, and explores how they intersect. My name is Anneka Gupta, President and Head of Products and Platforms at LiveRamp. I’m taking over for Daniella on hosting duties today.
Today I’m super excited to be joined by David Gilmore, Founder and CEO of DataFleets, which was recently acquired by LiveRamp. David is now Head of Privacy Tech Solutions. Welcome David.

David:
Hi Anneka. It’s a pleasure to be here.

Anneka:
I would love to start out by asking you to tell us a bit more about your background and how you founded DataFleets.

David:
Absolutely. Happy to share. So before starting DataFleets, I was working as a data scientist, and actually I got to work with a lot of different sensitive datasets across health care, national security, investment banking for anti-financial crime, and also in law enforcement to help stop human trafficking. And really the core idea that inspired DataFleets was when I was working on a health care project to perform machine-learning analytics to identify early-stage cancer. We were successful in developing some analytics that were able to—with superhuman accuracy—find cancer so that the patients could then be prioritized for treatment and ultimately make their outcomes more successful.
While that project was very successful—it’s running now in 300 hospitals around the United States—I left wondering, why don’t we see more initiatives like this? And the reason was because the data was locked away. We were only using a fraction of the data—the tip of the iceberg—of what was available for analytics. And so much of that was really down to the privacy and sensitivity of those datasets.
That is what begged the question for me, why is it this way? Maybe we can actually engineer technical solutions to overcome this divide between privacy and sensitive data in impactful analytics that really improve life for consumers across the board. That’s what we’ve done with DataFleets.

Anneka:
That’s awesome. I’d love for you to share how you went about your startup journey—beginning with one product idea that evolves over time as you test in market and get customer feedback. Where did DataFleets start in trying to solve this problem? How did that evolve over the first couple of years?

David:
So we knew from the beginning that unlocking data was the core thesis around DataFleets, and we wanted to do so in a way that preserves privacy and provides ways to make accessing data secure. But what exactly that would look like and its manifestation wasn’t quite clear.
So the way we went about that was through extensive customer discovery across many different industries. At the time, my co-founder and I were at Stanford, and we had access to a lot of different institutions. We did discovery interviews in health care, technology, financial business, etc., and used that to develop a thesis of what the real pain points around privacy and analytics were in institutions in order to build what needed to be an enterprise-grade solution to solve this for our customers.

Anneka:
And what did that end up manifesting as looking like? Can you give us a little bit of background on the technology that you built, how it works, and use cases that it solves for today?

David:
Absolutely. So there were two things that emerged throughout that process, and this was when it was just the co-founders. What we saw was, the technical approaches to this problem were many. This problem has existed, the private and locked-up datasets, it’s existed for decades. And there have been multiple different technical solutions that try to attack different parts of the problem, each of which had their own shortcomings.
What we realized by also consulting across these different research domains is that you can actually merge together multiple different approaches to create a solution that’s greater than the sum of the parts. And we realized from talking with customers that this is necessary in order to truly build a platform that is going to be successful for them. So, that’s one thing we learned. Another thing we learned was that the approaches for privacy-enhancing computation and analytics are deeply technical. They rely on a lot of really in-depth research.
And most of the approaches in this particular domain had been built by scientists, and scientists are great at a lot of things. They’re great at having great insights and coming up with novel approaches for how technology can be built. But to actually build those things to scale you need engineers and individuals who have the battle scars from developing production systems.
Our approach was to find the best engineers with those battle scars and bring them together in order to essentially take all of these approaches out of the lab and put them into practice in ways that can fit into an enterprise’s architecture. So, that was the way we built.
And now with those two insights, the multidisciplinary approach and having the battle-tested engineering team that are tackling the productionization of these techniques, what we came up with is a single cloud data platform that’s able to unify sensitive and distributed datasets for analytics. It has two capabilities.
The first is privacy, and that privacy acts like a firewall that sits in between the data and its user such that the user can ask any question of that data, whether it be business analytics or advanced analytics like machine learning, and all the while we have mathematical definitions of privacy that guarantee the data stays safe. That’s the privacy capability.
The second one was the federation capability, which, in our data platform, allows multiple databases to be connected with one another, all of which are protected by this privacy firewall, which allows multiple institutions, or multiple lines of business, or even multinational companies with datasets across multiple different countries, to unify those distributed and sensitive datasets for a single point of access with any sort of analytical capability.

Anneka:
That’s super interesting technology. Can you also share an example of an actual customer use case that you’ve been able to solve with this technology that otherwise would be very difficult to execute on if this technology and approach didn’t exist?

David:
Absolutely. One of the examples that was very special to me, based on where this initial insight had come from in health care, was powering a COVID-19 research consortium for Hospital Corporation of America (HCA). That consortium, called the CHARGE Consortium, has the largest inpatient and ICU database of COVID-positive patients in the United States. And today, 11 other research institutions, including Johns Hopkins, Harvard, AHRQ, and eight others, use HCA’s data powered by DataFleets’ privacy-preserving access layer to answer critical research questions about patients.
This type of research has not been possible before because the nature of ICU and inpatient data from hospitals is that it’s highly, highly sensitive information, and the previous approaches to this problem simply degraded the analytics quality far too much for that data to be useful. But now what we’ve done is unlock that data at HCA, such that some of the best epidemiologists in the world are able to access it and answer critical research questions that ultimately can improve care for COVID-19 patients.

Anneka:
Super, super interesting and impactful use case. Well, now that you’re part of the LiveRamp family, I know one of the big things that you’re working on is incorporating DataFleets’ technology into the way that we’re approaching Safe Haven and data collaboration for our customers. What are you most excited about in this next phase of the journey together?

David:
Yeah, it’s hard to say what I’m most excited about because the list is long. I think a lot of the things that we’re doing together, as DataFleets and LiveRamp impact a number of the hot-button issues that society faces globally today. I really mean that. And so I’ll touch on what a few of those look like. There’s no way we’ll get through all of them.
One of the first things that I think about is this growing tension for consumers, where consumers on the one hand increasingly demand that their data is kept private, confidential, and it’s treated with respect and care. And by and large, in the data economy that’s existed for the past decade or so, that bargain with the institutions with whom they show the data has not been held up. And so they are demanding trust and respect with the institutions they do business with. So, that’s the privacy concern.
On the other hand, consumers also are expecting a higher degree of personalization from the brands that they love. And in order for brands to be competitive, they have to learn how to personalize. They have to be able to really learn what their customers want and get inside their heads, so they can deliver the best possible products and experiences. It turns out that to accomplish that, those organizations need data.
And so this is where the tension arises, where we have a privacy preference and a personalization preference. So how are we going to manage this tug-of-war that emerges between the two? The beauty of having privacy-enhancing technologies come into the picture is that’s what they’re explicitly designed to do. They’re there to alleviate the tension between privacy and the utility, or the analytics insights that we can derive from data.
And this leads me to one of the things I’m most excited about: privacy-preserving personalization at scale. We’re never going to have a perfect trade-off between personalization and privacy. There’s always going to be some compromise, but what we can do is optimize and elevate what that trade-off looks like between them, so we can start to have our cake and eat it too, while giving consumers both of these preferences. That’s the first thing that I’m super excited about.
It’s also related to the second hot-button issue in society that I think we’re really at the center of here, where you have the data haves and the data have-nots, and it’s increasingly difficult for the data have-nots to remain competitive and connect with their end customers. What we’re doing is changing that and alleviating this tension. We’re leveling the playing field because by introducing ways for institutions to collaborate with one another, we’re allowing institutions to be able to access more data about their consumers and their customers so that they can perform that personalization and start to compete and level the playing field. I want to see a future in which we have a level playing field, but we also have protected consumer privacy at the same time.

Anneka:
Yeah. When you talk about the data have-nots, it’s really interesting. What is an example of a type of company that potentially falls into that data have-nots category, and how can Safe Haven powered by the DataFleets technology help solve that problem?

David:
Yeah. So, if we look at the types of companies that have a lot of data right now, we’re talking by and large it’s a lot of the tech giants, but then also we have the brick-and-mortar retailers as well, that have these points-of-contact with the customers physically themselves. And then on the other side of the coin, we have some of the data have-not companies, or maybe certain brands that struggle to actually get that touch point with the customer through interacting with retail datasets, because let’s be real, the type of data the retailers are willing to give to the brands for those insights, is limited.
By enabling this privacy protection, we’re able to facilitate configurable trust between those entities so that those brands can start to get insights about the customer, and start to understand these customers. How do we reach these customers?
Similarly, those brands have found it increasingly important to measure the effectiveness of their campaigns to reach these customers and those audiences. We’re able to start connecting them more directly with publishers that have the platforms to engage with these consumers so brands can start to measure the effectiveness of those campaigns and look at things like return on ad spend without having to compromise the privacy of all of those consumers who are actually viewing those ads.

Anneka:
The technology that you and your teams have built is incredibly exciting. It’s incredibly revolutionary. It’s really going to change the whole game for how companies and enterprises across industries do analytics and data science. I’m curious to get your perspective as a thought leader in this space. How do you see analytics and data science changing in the future?

David:
I think we’re going through some really exciting times in the fields of machine learning, advanced analytics, and data analysis in general. Of course, the need for it is becoming ubiquitous. And in the past decade, we’ve proven that some of these advanced analytics capabilities, like deep learning for example, are viable, and we’re starting to see the fruits of that with self-driving car pilots actually being rolled out, right? And seeing facial recognition become commonplace. I authenticate to my phone now using facial recognition. So computer vision has really become a part of our lives, same with other types of analytics. So we’ve gotten very far recently in these analytics. And the reality is the growth is going to be exponential in terms of the capabilities here. And I think the key thing to realize though is that all of these analytics are data hungry.
The way we’ve had these breakthroughs is having higher volumes of data and having the computer processing to be able to manage that data. And by bringing these things together, suddenly we’ve seen extraordinary results. We’ve only, however, tapped maybe 1% of the data that’s available to us because the rest of that data is locked away. And a lot of that is because of these various sensitivities we’re talking about around privacy preferences, which is rightfully so.
What I see as the future is a world in which, instead of downloading a dataset that has been curated, or maybe is out of date and is not a dataset that’s representative of the actual domain that I want to model, we’re instead going to be able to access data more freely from other institutions that’s going to be live, and timely, and real, so we can start answering relevant questions.
What we need to do is solve the issues that really plague society today. And we have the tools, we have the technology, and we have the data—we just haven’t brought them all together. And by having this new type of infrastructure in the data economy to connect data with amazing talent to answer these questions, we’re going to see rapid advancement of scientific research and an acceleration of advanced analytics and their impact on society.

Anneka:
Yeah. That’s a super inspiring vision. I think personally, one of the things I’m really excited about, as more datasets get unlocked to provide real benefit to society, is the way that it can advance diversity, equity, and inclusion, which is a topic that we talk about a lot in this podcast, because with so much of the data about different parts of the population right now, there are major holes in the datasets that exist and are accessible. And there’s so many interesting applications of data and machine learning, and AI specifically, to solve some of these big societal problems that have persisted for a long time because of the lack of data.

David:
One of the concrete examples that surfaced when we were doing discovery in financial services is this need to be able to lend in a way that is protecting the privacy of the lendees—those who are being underwritten—but then also to be able to measure the bias of any automated decision-making capability. And so, of course, these days, these lenders are using machine-learning analytics, and they have to protect the privacy of these patients, which is to say the actual data scientists working on the underwriting algorithms cannot be looking at the sensitive information about these patients, of these consumers. So specifically they can’t look at protected class information, for example, sexual orientation, religion, criminal history, these sorts of things. But on the flip side, these data scientists are held accountable to make sure that their models don’t discriminate against anyone for that reason. So their hands are tied. There’s no way they can actually benchmark their models without seeing that data to know whether or not we’re discriminating.
And so enter privacy-preserving analytics, and suddenly they can access data and perform machine-learning modeling on a population of consumers, and they don’t see any of the private information, but they can also benchmark and see if the automated decisioning models that they’re creating are biased against any protected class information. And so they can make sure that what they’re doing is lending in a way that is fair, and they can do it in a way that’s also preserving consumers’ privacy.
So this is just one example, concretely, for a major problem in financial services that we can solve now, because we have privacy-preserving analytics. And the end result is that the lending decisions that are made can be made in a way that’s fair, in a way that also preserves privacy, which is super exciting. And that’s just the tip of the iceberg for things we can do for society to help with diversity, inclusion, and belonging.

Anneka:
That’s excellent. What a fascinating and incredibly inspiring use case. So switching gears, I’d love to switch to the more personal side. We’re coming on, hopefully, knock on wood, the end of the pandemic with broad vaccine availability starting to roll out, at least across all of the U.S. But as you reflect back on the past year, I would love to hear from you, how have you managed through this past unprecedented year? Do you have any tips and tricks, advice, or stories that you’d like to share?

David:
Yeah. It’s been a wild ride this past year. It’s been the pandemic, of course, which has been, it’s affected us all. And at the same time, running a startup, iterating quickly, 80-hour work weeks, is always going to be intense. I had some other personal things going on at the same time.
My wife and I got married, which was an adventure in its own right. Her fiancé visa came through right before, she’s from Europe, and came through right before the lockdowns took place and the borders were closed. She was on the last flight out of the UK. And then we had to get married within a short time frame. All of the county clerks were closed, so we had to desperately find a county clerk and do a last-minute elopement. And then we moved, and then we got bed bugs actually, all during the pandemic. And we had to get rid of all of our furniture and move again, and slept on an air mattress for a while. So it’s been a very stressful past year, I would say, but it’s exciting nonetheless.
I’d say the thing that has been the most constant stress reliever has just been getting outside, going on walks, going on hikes, especially here in lovely California, getting out amongst the forests, or along the coast has been one of the best medicines, so to speak, to just take a deep breath amidst everything seemingly happening all at the same time. The temptation is there to just binge Netflix in those moments, but making the time to instead get outside and have some activity has really been a good recipe for success for us.

Anneka:
I think so many of us in the pandemic have found that exercise, getting outside, reconnecting with nature, has been a real blessing and opportunity.

David:
Absolutely.

Anneka:
Glad that you’ve been able to do that. Well, David, thank you so much for joining me for this podcast. It was great to hear your story, hear about the DataFleets technology in more detail, and also just hear your thoughts about the future of data science analytics and machine learning.

David:
It’s been a pleasure Anneka. I’m so excited about the road ahead and what we’re going to do together. Thanks.

Anneka:
Thanks for listening to Saying the Quiet Part Out Loud. Be sure to rate and subscribe on Apple Podcasts, Spotify, or however you listen to podcasts.