A tool for human agents to train virtual assistants, and seamlessly take over when the bots stumble

To comply with non-disclosure agreements, all confidential information has been omitted from this case study.

Workfusion developed a human-in-the-loop component for a partner's chatbot product, and I led the user experience effort for the project. more
I worked with business analysts to understand the situations that required human intervention, and how to quickly resolve the hang-up. more
Using a lean design methodology, we quickly iterated and tested designs to deliver results to the client on an aggressive schedule. more
The final product reduced the total time agents would spend on tasks by 90%, and significantly increased the speed of the bot's learning. more

Context

Workfusion was contracted by a leading technology company to collaborate on a chatbot project. The client wanted to take advantage of Workfusion's expertise in crowd computing and machine learning in order to train the bot - using our crowd worker platform and ML-ready structured data pipeline to gather data for the client to train their AI on.

We decided to adapt our Highlight text-tagging software into a human-in-the-loop interface. Because the necessary changes and the client's own APIs would not be ready for some time, we decided to use a Lean approach to developing the tool, starting with an unstructured all-human workflow and gradually introducing machine learning components as they became ready. In order to support an end-to-end use case more quickly, the client decided to focus on the restaurant booking space.

My roles

Customer insight & ideation – Worked with a business analyst and the client's product manager to identify gaps in the tool's functionality, and determine what changes or new features would resolve these gaps.
Leadership – Presented to Workfusion management and client stakeholders to gain buy-in for my designs.
Experience strategy & vision – Developed sequence and flow diagrams to establish the overall model and design principles for the experience.
Design execution & validation – Created low-fidelity wireframes, high-fidelity mockups, and design specs. Created interface prototypes throughout the project to validate the feasibility of my designs, and tested them with users.

Discovery

Workfusion senior management had worked out the rough details of the workflow that would connect our systems to the client's, and provide the necessary data. They had also worked out the service level agreements (SLAs) that the system would have to support. This meant that we had the buy-in necessary to perform MVP-driven iterative discovery. We used the SLA as our target metrics, aiming for a particular speed of response and number of responses per day at each stage of development.

I worked with internal stakeholders and the client's product manager to create a simple scenario which could provide the necessary data to the client's AI platform.

After consulting with engineering, I realized that we could not deliver on this scenario immediately, and would need to iterate towards it. This low-fidelity flow would serve as our design vision for this stage of the project, as we built out infrastructure to support it. It was severely limited: instead of real-time response, each worker would only receive a single message, which they could annotate and respond to once, in free form. I facilitated the conversation between management and engineering to make sure that the client's minimum needs would still be met.

We decided to use off-the-shelf chat software (Rocket Chat) to close the technology gap. While the end-to-end response time would be slow, the end-user experience would be contained within a single chat session. We deployed this chat into our testing environment, with 6,000 workers per day participating.

In order to improve the quality of our data collection and experience, I pursued three research strategies:

A second crowd worker task would collate a conversation and ask a new worker to review the original work. In addition to improving the quality of annotation, I also asked these reviewers to classify the worker's responses along categories, such as "agent asked user for missing time" or "agent sent user search results." This data let us improve instructions for workers, while working towards building automated bot responses and understanding when to use them.
After the worker completed the task, they would receive a survey asking them about the steps they took to find information. One of my goals for the user experience was to minimize time spent off-tool (since this is activity that we would not be able to track or add to the bot), and learning the workers' processes gave us valuable information for adding features to the tool in later iterations.
As end-users received responses, they could use thumbs up/thumbs down buttons to judge their quality. Users marking a response thumbs down would be able to provide an additional, optional comment.

Design Process

By synthesizing worker, reviewer, and user feedback into an affinity diagram, I discovered key areas that the design would need to focus on in order to support a faster SLA, and turned these into design principles that would guide this phase of the project.

The first issue was with typing out responses. As we initially predicted, most worker responses could be reduced to answering questions about missing information (such as the desired restaurant or the number of people going), or providing options to choose from (recommended restaurants or available times). We created an interface for canned responses that workers could quickly send, reducing the amount of time they would need to spend typing.

In cases where workers would have to provide an option, workers spent a lot of time off-tool to find the information, and then even more time entering it. I iteratively designed a solution that would accelerate this process. First, I replaced the free-form input box with an auto-formatting script I developed. This allowed workers to copy the entire results carousel from our client's search website, and paste them into the chat, where they would be automatically formatted into plain text. Once we saw that the search engine's results were fit for purpose and freed up more engineering resources, I replaced that interface with an automated list of results. Workers could search from within the chat, and quickly remove unsuitable results. As the chat AI improved, all the worker would need to do was to select the category and the bot would find and send the values on its own.

Finally, there was a lot of UI overhead that workers needed to handle. Our tagging interface was designed for financial documents, where precise selection of values was important and the correct annotation was often ambiguous. I designed, prototyped, and tested an interaction technique for whole-word selection and annotation suggestion, reducing time spent annotating by 80%.

As the AI improved, workers no longer needed to participate in every phase of the conversation. I worked with the client's PM to redesign the experience based on the bot's latest capabilities. We introduced an auto-response mode where the worker would only be prompted when the bot could not understand the client's response. The worker could tag the phrase that the bot was confused by, and the conversation would automatically continue if possible. Workers could not support multiple clients more effectively.

Outcomes

Over the course of the project, we were able to get the time per response down from 15 minutes to 10 seconds. Overall, the time workers spent on an individual response was reduced by 90%. Since most responses could now be automatically selected by the bot, the number of tasks that had to be issued to human workers (and therefore the cost of operating the chatbot) was also reduced by over 99%.

This project led to the creation of Workfusion's own chatbot system, which we successfully launched alongside several international banks and insurance firms as partners. Thanks to the expertise gained from this project, I was able to help make chatbots a core part of Workfusion's business.

Learning Outcomes

Working with a client-partner proved to be challenging, due to different working cultures and having to collaborate remotely for the majority of our partnership. As the client was building out their product and APIs at the same time as we were building the worker interface, crucial features such as search or even basic automation were not available to us until fairly late in the project. While it was often difficult to find a workaround, I enjoyed creating scripts and quick hacks that could get us most of the way there.

This was my first experience with a project where code could be shipped to production environments on a daily basis. This was an immensely valuable learning opportunity, as I could do the kind of iterative design that I had never been able to do before. Unfortunately, this also meant that a lot of my ideas were quickly rejected, but the data we gathered from failed features was ultimately cheap to obtain and helped steer us in the right direction.