Bloomberg Data License
Bloomberg Enterprise sought to expand its share of the growing market for AI/ML training data by shortening the sales cycle for its Data License product.
Pavel Samsonov
Bloomberg Enterprise sought to expand its share of the growing market for AI/ML training data by shortening the sales cycle for its Data License product.
To comply with non-disclosure agreements, all confidential information has been omitted from this case study.
Data Scientists need to discover valuable data, purchase it, clean it, and plug it into their models before they can return for more data. The Global Head of Enterprise Data set time-to-revenue as a north star metric that would capture the health of the business across these dimensions.
The incumbent process for discovering and downloading data was so complex that the existing users of the data - quantitative researchers, or “quants” - could not self-service. Instead, Market Data group worked with Bloomberg’s Content team to set up a big once a day data request. This intermediary process was the main cause of our high TTR.
I worked with the Content team to identify which intermediate steps could be cut to get customers closer to our data. We defined a set of core needs that let us sidestep most of the complexity accrued over 20 years of product evolution. The BI team identified a cohort of customers that used only the core capabilities, and my team rapidly tested prototypes with them to arrive at a starting point for an impactful scope that reduced TTR by 75% when released.
Data License is a Bloomberg Enterprise business for non-real-time datasets in two flavors: Bulk and Custom datasets. I joined Bloomberg when this business was modernizing Data License from a legacy SFTP driven backend to a modern HTTP based transport layer.
The global head of Enterprise saw custom data as an opportunity for growing the business. While the bulk data side was relatively straightforward, the custom datasets offered a lot of opportunity for improvement due to the complexity of the product.
The North Star metric for this strategy was Time-to-Revenue. Reducing the amount of time it took for a data team to gain value from a dataset would mean that they could come back for more sooner.
My focus was on the discoverability and self service layer. A back-end PM and a Data PM would take care of the last two, in accordance with the user experience requirements I specified.
Bloomberg was a decades-old business with inconsistently documented products developed across ad-hoc use cases and legacy systems. I began by creating a model of the current state of both available offerings and customer needs.
One of Bloomberg's competitive advantages is its high-touch client relations. I sought out a variety of specialists who could tell me about the client from multiple angles.
I also reached out to customers - both internal and external users of the tool. I used the PACT Analysis framework throughout my conversations to synthesize their different perspectives and expertise into a complete picture of our users.
My research revealed the extent to which the current state of Bloomberg Data required white-glove service. Over the years, the specification code for custom datasets had evolved to the complexity of a programming language. Users who did not want to take a week of training to learn it had to use the Request Builder, downloadable software that was difficult to deploy in an Enterprise environment and complex to use.
As a result, the Market Data groups of Bloomberg customers would have to take on a liaison role with Bloomberg’s Content team. Instead of focusing on acquiring new product, Market Data had to act as an intermediary - collecting data requests from quants via Excel spreadsheets, having Content assemble it for them into a dataset specification, and then cutting up the big file that was returned and sending it back to their internal customers. The Excel sheets also made Market Data’s job - maintaining a single source of truth for the company’s data - frustrating, manual, and time consuming.
Even though these datasets were generated daily, most of them could not actually use the system scheduler. Requests in the SFTP system could not be edited; no matter how minor the change, customers had to cancel the request and re-send it. Many customers ended up developing their own bespoke scheduling software. This was especially challenging because the data in SFTP bore little resemblance to what was in the data catalogs on Bloomberg. These points created huge problems for new customers who would have a steep cost of integrating with Bloomberg systems.
This process would repeat not only for large datasets that were updated once per day, but also small datasets that quants would want to pull throughout the day to understand the “shape” of the data and see what they could do with it before buying the full dataset or time series. The slower quants got back their requested data, the fewer datasets they would end up sampling and buying.
To remove the additional drain on the Market Data team’s time so they could focus on sourcing new data, we took a two-pronged approach. We would create a best-practices version of what out customers were already doing - such as the scheduler and the source of truth - directly on the website. We would also, where possible, create a self-service capability for data scientists to sample, download, and even purchase data without needing to go through Market Data or Content.
While stakeholders pushed for a 1:1 replacement of every SFTP feature, the MVP release would have to focus on a subset of users. Together with the Content and BI teams, we defined a set of core needs that let us sidestep most of the complexity accrued over 20 years of product evolution, while still covering over 90% of users.
We would simplify the Market Data team’s job of keeping a single source of truth by building Resources directly into Data License. We would also create a catalog that was identical across GUI and API, so that command line users and website users could both manipulate it. That way we would remove the barrier to self service, leaving Market Data to focus full time on purchasing data.
One resource could also be used to combine the sampling and the scheduled dataset use cases. A quant could define a set of fields and test it with a small range of securities and a shallow time series, and then Market Data could refer to that same list while expanding the universe of securities.
I developed two low-fidelity designs to put these concepts into practice and see how they changed the workflows of our users. The first design treated the request as a shopping cart, with the user selecting resources from the data catalog. The second design treated resources and elements as interchangeable, on a tag-style UI.
Users enjoyed the simplified interface that allowed them to focus on the content of the request. However, they found it challenging to navigate the list of all available resources when it grew longer than a few items. Within the resources themselves, users wanted to see their instruments and fields in a table format that allowed them to easily manage lists of hundreds to hundreds of thousands of items. Seeing a small amount of identifiers from each resource was not valuable to them.
I designed an updated workflow incorporating this feedback, advancing the level of fidelity of the project. As the new designs began to match the users' mental model more closely, I pivoted from gathering qualitative feedback to measuring task time, error rates, and success rate metrics.
I worked with the engineering team to break the scope down into implementable user stories, and created a phased rollout for the desired capability. I worked with the Content and Sales teams to slowly onboard a small number of users to test our assumptions about real use in a controlled environment. We would track usage through Business Intelligence and BEAP analytics, as well as through regular touch-bases with the trial users.
The new Custom Dataset interface minimizes the amount of effort necessary to make consistent data requests, by leveraging the ability to create and reuse component resources. Users were able to set up an error-free request in under one minute without having ever used the original Request Builder interface, and with no training on the tool.
Whether creating a request for only a few instruments, or managing a portfolio of hundreds of thousands, users only need to define a resource once, and can update it once to propagate changes across all datasets using the resource as part of their schema. Clients no longer needed to cancel recurring requests, update their instruments, and then re-submit them all. Overall, the new workflow took over 95% less time than before.
Because the user's input is parsed directly in the UI, we can immediately provide feedback when a provided parameter doesn't match expected values. In the past, users would only find out that their request contained an error hours or days later when it actually executed. In the new design, nearly all input errors can be detected or prevented.
Get in touch if you want to learn more about this case study.