Pavel Samsonov,
product designer

A tool for human agents to train virtual assistants, and seamlessly take over when the bots stumble

To comply with non-disclosure agreements, all confidential information has been omitted from this case study.
  • Workfusion developed a human-in-the-loop component for a partner's chatbot product, and I led the user experience effort for the project. more
  • I worked with business analysts to understand the situations that required human intervention, and how to quickly resolve the hang-up. more
  • Using a lean design methodology, we quickly iterated and tested designs to deliver results to the client on an aggressive schedule. more
  • The final product reduced the total time agents would spend on tasks by 90%, and significantly increased the speed of the bot's learning. more

Context

Workfusion partnered with a Fortune 100 company to develop a virtual assistant bot that would be used across a variety of platforms and provide various concierge services. Since AI technology is far from perfect, Workfusion wanted to develop a system that allowed humans to intervene when the bot was not confident in its understanding of the situation. A team of human agents would be available around the clock to take over a conversation, resolve uncertainty, and seamlessly return the reins to the bot.

My roles

Discovery

The concept of human-in-the-loop chat bots was too new to have many established best practices, so I drew on an analogous domain in my research: online help desk interfaces used by customer service agents. I conducted interviews with professionals in customer support training, and with customer support agents. Since Workfusion had already created a number of tools for organizing unstructured data, I interviewed the analysts that helped develop those tools in order to learn about any pitfalls we had encountered during the development of these tools, and any valuable discoveries we had made.

In order to explore the human-to-human chat interface, I worked with the crowd manager and developers to set up thousands of conversations between crowd workers, revolving around strictly-defined tasks such as finding and booking restaurants. We analyzed and grouped the free-text responses of workers in the agent role, in order to see what kinds of pre-set responses agents would find valuable. I also gathered a large amount of feedback from the workers themselves about their experiences, and consolidated them in an affinity diagram in order to find any flaws that would need to be addressed in later development.

Design Process

Development followed the Lean methodology – we would design a minimum valuable version of the product, push it into development, and use the knowledge gained from its testing to develop the next version. The project manager and crowd manager were responsible for gathering this information, and I designed the next iteration of the product. I maintained a sequence diagram for every user flow within the tool, and added on to it when designing these changes. Then I created low-fidelity designs, which I user-tested and brought to the team for discussion. Based on the results of testing and feedback, I developed these designs into high-fidelity mockups and then prototypes.

The project manager would sometimes reject designs because of infrastructure issues – the described functionality could not be supported until later on in the project. When this happened, I designed workaround features, and implemented them in code until the development team could support the original vision.

After the structure of the product was ready, I focused my efforts on optimizing the speed with which agents could work. I developed and tested several new interaction techniques for structuring text data. The technique was 20% faster than our existing tagging interface, which helped agents manage a larger number of simultaneous conversations.

Outcomes

The product is still in development, but the internal beta test was a success – my colleagues often use the bot when they need to accomplish tasks within its current purview, and they can rarely tell the difference between automated and human agent replies.

The agents themselves have provided very positive feedback about the product, compared to the original feedback collected in the research phase. The time to complete requests dropped dramatically from initial results, from 15 minutes per request to 10 seconds.