Computing Distributions of Large Datasets with Cascading and the q-digest Algorithm
Distributions are a powerful tool for understanding datasets. As an example, imagine that you’re interested in quantifying user engagement for a new app you’re developing. To this end you compute the distribution of monthly engagement time for your users and discover the following trends:
You learn that most of your “users” rarely spend anytime using ...