Insight through Segmentation

Gauge.io
4 min readDec 10, 2021

For the last three months, Gauge has been mentoring two students from UC Berkeley’s Data Science department, Ben Skinner and Rafael Shifren. This Fall, they worked together to answer one simple question – how could they take Gauge.io’s Anchorbox platform and tailor it in such a way to make it more configurable for the user? Anchorbox, a market segmentation tool, takes in a CSV file and provides a thorough analysis through two separate visualizations – Clustering and Weighting (as shown below). These abstractions of segmentation provides the user more visibility into their data and ultimately, what are the unique attributes of their consumers.

Anchorbox Clustering

Feature Development Research

Before deciding on a feature to develop, the students needed to research what processes are currently being used today and, more importantly, decide what missing features of the Market Researcher’s toolkit would be the most beneficial for Anchorbox to implement. Through conversation with industry experts in a variety of fields, they were able to gather a rough list of potential features that would bolster Anchorbox’s platform. These experts spanned from industries such as car dealerships to apparel distributors, providing an initial analysis on how segmentation is used across multiple industries.

After some careful thought and discussion, the Berkeley team decided that the most helpful feature Anchorbox was missing was in allowing the user to select a specific number of clusters. Currently, a user will manually adjust the ‘Sensitivity’ and ‘Distribution’ sliders, which automatically assigns a number of clusters to their dataset. While this is a great feature of Anchorbox, this algorithm does not allow one to preset the number of clusters and can be hard for someone who isn’t as technically proficient to understand. This option of predetermining the number of segments gives users an option for specific use cases, and will be helpful for users to extract more information from the resulting visualizations.

Anchorbox Weighting

Feature Development

Those who have experience working with clustering will likely be familiar with two common classes of algorithms: density-based clustering and partition-based clustering. Anchorbox, in its current form, applies the density-based clustering algorithm HDBSCAN to a dimensionally-reduced form of the user’s survey responses. This algorithm extends the functionality of the famous DBSCAN by building a hierarchy of potential clusters before cementing a final form. This algorithm is an excellent way for users to cluster and visualize their survey responses without having prior information about how many clusters they expect to see in Anchorbox’s visualizations.

The additional feature that the Berkeley team has been working to develop stems from the interest of many users, especially those with cemented distribution networks, to verify their understanding of how they should be segmenting their customer groups or poll respondents. For this, we focused on utilizing the dimensionality reduction algorithm UMAP, which currently bolsters HDBSCAN in Anchorbox, and using it to assist a partitioning algorithm that would allow users to pre-specify a number of clusters to visualize. Since we both have had experience with the K-means clustering algorithm in our Berkeley coursework, we chose to implement this unsupervised learning method as our alternative to HDBSCAN. In the updated Anchorbox, users will be able to select whether they would like to pre-specify their number of clusters or allow the algorithm to automatically detect the optimal cluster amount, using K-means and HDBSCAN respectively.

adding a dropdown for manually specifying segments

After implementation of our algorithm into the backend, we had to alter the user interface, as shown above with the ‘Segments’ option. When ‘Segments’ is ‘Auto’ it will run HDBSCAN. Otherwise, with any numerical value assigned, it will run through an open-source K-means algorithm.

Impact & Next Steps

Giving consumers the ability to control the number of segments in Anchorbox’s algorithm makes this platform more configurable for each individual user. Specifically, this platform will become more precise and reveal more about the hidden relationships contained within consumer data, leading to new successful market strategies as a result. Adding this feature makes Gauge.io’s Anchorbox a unique platform that gives potential and current users the ability to derive informative visualizations, uncovering more of the behavior and traits of their customers.

--

--

Gauge.io

Gauge is an integrated, user-centric consultancy of ethnographers, designers, data scientists and engineers; largely based in the San Francisco Bay Area.