Engineering Blog

Engineering

Learn from our challenges and triumphs as our talented engineering team offers insights for discussion and sharing.

Utilizing MapReduce Combiners and HyperLogLog++ to process millions of queries over datasets with billions of records

The fundamental challenge being solved is aggregating and counting the unique values in a large multiset. Here is a matrix visualization of one dataset:  The goal is to be able to retrieve counts of unique IDs with different constraints. For example (using SQL-like syntax to illustrate), computing COUNT (DISTINCT ID) WHERE (Field 1 = 0) ...