Pavel Samsonov

Bloomberg Data License

Bloomberg Enterprise sought to expand its share of the growing market for AI/ML training data by shortening the sales cycle for its Data License product.

Product impacts

60% reduction in failed data requests through up-front validation that corrected user error in forming requests.
75% faster time-to-revenue accomplished by reducing the amount of integration, training, and data cleaning required.
20x faster task completion for both technical and non-technical users of the product.
90% reduction in support queries compared to the incumbent, FTP-driven solution.

Reducing time-to-revenue for Bloomberg data

To comply with non-disclosure agreements, all confidential information has been omitted from this case study.

Data Scientists need to discover valuable data, purchase it, clean it, and plug it into their models before they can return for more data. The Global Head of Enterprise Data set time-to-revenue as a north star metric that would capture the health of the business across these dimensions.

Modeling the present state
  • Subject matter experts
  • Usage patterns
Segmenting customers and identifying pains
  • Customers
  • Help desk tickets
  • Customer-facing specialists

The incumbent process for discovering and downloading data was so complex that the existing users of the data - quantitative researchers, or “quants” - could not self-service. Instead, Market Data group worked with Bloomberg’s Content team to set up a big once a day data request. This intermediary process was the main cause of our high TTR.

Iterating through artifacts
  • Information architecture
  • API design
  • Wireframes
  • JS prototypes
Validating with customers
  • Quants
  • Data Scientists
  • Data Managers

I worked with the Content team to identify which intermediate steps could be cut to get customers closer to our data. We defined a set of core needs that let us sidestep most of the complexity accrued over 20 years of product evolution. The BI team identified a cohort of customers that used only the core capabilities, and my team rapidly tested prototypes with them to arrive at a starting point for an impactful scope that reduced TTR by 75% when released.

Resource manager Allowing users of both the GUI and API to update individual resources without needing to cancel and re-submit the recurring data request removed one of the biggest time-sinks for Market Data teams.

Context: Bloomberg Data License Modernization

Data License is a Bloomberg Enterprise business for non-real-time datasets in two flavors: Bulk and Custom datasets. I joined Bloomberg when this business was modernizing Data License from a legacy SFTP driven backend to a modern HTTP based transport layer.

The global head of Enterprise saw custom data as an opportunity for growing the business. While the bulk data side was relatively straightforward, the custom datasets offered a lot of opportunity for improvement due to the complexity of the product.

The North Star metric for this strategy was Time-to-Revenue. Reducing the amount of time it took for a data team to gain value from a dataset would mean that they could come back for more sooner.

Input metrics
  • Time to discover data
  • Time to understand that the data was what they needed
  • Time to integrate Bloomberg with their systems
  • Time to clean the data (up to 80% of their time)

My focus was on the discoverability and self service layer. A back-end PM and a Data PM would take care of the last two, in accordance with the user experience requirements I specified.

Research into the current state

Bloomberg was a decades-old business with inconsistently documented products developed across ad-hoc use cases and legacy systems. I began by creating a model of the current state of both available offerings and customer needs.

One of Bloomberg's competitive advantages is its high-touch client relations. I sought out a variety of specialists who could tell me about the client from multiple angles.

Subject matter expert learnings
  • Service Delivery: how clients were currently being trained
  • Help Desk backlog: recurring pain points
  • Sales teams: customer requests
  • Business Intelligence: data on usage of individual features

I also reached out to customers - both internal and external users of the tool. I used the PACT Analysis framework throughout my conversations to synthesize their different perspectives and expertise into a complete picture of our users.

My research revealed the extent to which the current state of Bloomberg Data required white-glove service. Over the years, the specification code for custom datasets had evolved to the complexity of a programming language. Users who did not want to take a week of training to learn it had to use the Request Builder, downloadable software that was difficult to deploy in an Enterprise environment and complex to use.

The incumbent workflow is presented here as a service blueprint. Market Data teams from the client's side, and Content teams from Bloomberg's side, do a lot of manual work to get data requests from quants to Bloomberg's data service.

As a result, the Market Data groups of Bloomberg customers would have to take on a liaison role with Bloomberg’s Content team. Instead of focusing on acquiring new product, Market Data had to act as an intermediary - collecting data requests from quants via Excel spreadsheets, having Content assemble it for them into a dataset specification, and then cutting up the big file that was returned and sending it back to their internal customers. The Excel sheets also made Market Data’s job - maintaining a single source of truth for the company’s data - frustrating, manual, and time consuming.

Example screens of the Request Builder, showing the Fields Picker and the Headers configuration interfaces. Most of the visible settings were not used in typical workflows.

Even though these datasets were generated daily, most of them could not actually use the system scheduler. Requests in the SFTP system could not be edited; no matter how minor the change, customers had to cancel the request and re-send it. Many customers ended up developing their own bespoke scheduling software. This was especially challenging because the data in SFTP bore little resemblance to what was in the data catalogs on Bloomberg. These points created huge problems for new customers who would have a steep cost of integrating with Bloomberg systems.

This process would repeat not only for large datasets that were updated once per day, but also small datasets that quants would want to pull throughout the day to understand the “shape” of the data and see what they could do with it before buying the full dataset or time series. The slower quants got back their requested data, the fewer datasets they would end up sampling and buying.

Future State: Self-service optimizations

To remove the additional drain on the Market Data team’s time so they could focus on sourcing new data, we took a two-pronged approach. We would create a best-practices version of what out customers were already doing - such as the scheduler and the source of truth - directly on the website. We would also, where possible, create a self-service capability for data scientists to sample, download, and even purchase data without needing to go through Market Data or Content.

An added requirement from business and engineering was encouraging the use of the scheduler, to provide load balancing and more reliable revenue.

While stakeholders pushed for a 1:1 replacement of every SFTP feature, the MVP release would have to focus on a subset of users. Together with the Content and BI teams, we defined a set of core needs that let us sidestep most of the complexity accrued over 20 years of product evolution, while still covering over 90% of users.

We would simplify the Market Data team’s job of keeping a single source of truth by building Resources directly into Data License. We would also create a catalog that was identical across GUI and API, so that command line users and website users could both manipulate it. That way we would remove the barrier to self service, leaving Market Data to focus full time on purchasing data.

One resource could also be used to combine the sampling and the scheduled dataset use cases. A quant could define a set of fields and test it with a small range of securities and a shallow time series, and then Market Data could refer to that same list while expanding the universe of securities.

The resource-based data model folded a lot of complexity into a standardized format.

Iterative Design Process

I developed two low-fidelity designs to put these concepts into practice and see how they changed the workflows of our users. The first design treated the request as a shopping cart, with the user selecting resources from the data catalog. The second design treated resources and elements as interchangeable, on a tag-style UI.

Three possible user workflows that users followed when asked to create a new request. Regardless of the path users picked, they had concerns about the visibility of system status.

Users enjoyed the simplified interface that allowed them to focus on the content of the request. However, they found it challenging to navigate the list of all available resources when it grew longer than a few items. Within the resources themselves, users wanted to see their instruments and fields in a table format that allowed them to easily manage lists of hundreds to hundreds of thousands of items. Seeing a small amount of identifiers from each resource was not valuable to them.

I designed an updated workflow incorporating this feedback, advancing the level of fidelity of the project. As the new designs began to match the users' mental model more closely, I pivoted from gathering qualitative feedback to measuring task time, error rates, and success rate metrics.

Once users select a resource to add to the request, they can see the contents of that resource to validate their choice. If the choice was correct, the user needs to do nothing else.

I worked with the engineering team to break the scope down into implementable user stories, and created a phased rollout for the desired capability. I worked with the Content and Sales teams to slowly onboard a small number of users to test our assumptions about real use in a controlled environment. We would track usage through Business Intelligence and BEAP analytics, as well as through regular touch-bases with the trial users.

The design of the custom dataset builder evolved alongside Bloomberg's nascent design system, and informed its requirements.

Product outcomes

The new Custom Dataset interface minimizes the amount of effort necessary to make consistent data requests, by leveraging the ability to create and reuse component resources. Users were able to set up an error-free request in under one minute without having ever used the original Request Builder interface, and with no training on the tool.

Users can base their dataset on any combination of resources from existing Data License datsets, allowing them to get started manipulating data without incurring any new costs.
Resources are versioned and bound to a specific customer catalog, ensuring effective governance. The URI is exposed to enable scripting against the resource as easy as manipulating it in the UI.

Whether creating a request for only a few instruments, or managing a portfolio of hundreds of thousands, users only need to define a resource once, and can update it once to propagate changes across all datasets using the resource as part of their schema. Clients no longer needed to cancel recurring requests, update their instruments, and then re-submit them all. Overall, the new workflow took over 95% less time than before.

Because the user's input is parsed directly in the UI, we can immediately provide feedback when a provided parameter doesn't match expected values. In the past, users would only find out that their request contained an error hours or days later when it actually executed. In the new design, nearly all input errors can be detected or prevented.

The resource selector enables users to connect their existing workflows to the new tool, whether using any of Bloomberg's existing catalogs or pasting from an Excel spreadsheet.
Even request files that have been used for many years can contain errors. The new Data License UI catches them on the front end with a real-time verification system.
Name and description generate automatically based on the resources selected, ensuring that other users who share the catalog won't see it fill up with inscrutable junk datasets.

Get in touch if you want to learn more about this case study.