Pavel Samsonov,
product designer

An annotation tool for extracting and structuring data in financial documents

To comply with non-disclosure agreements, all confidential information has been omitted from this case study.
  • A major bank wanted to automate invoice handling using Workfusion. I led a team to close the gap between our existing tool and the client's needs. more
  • User research revealed a set of pain points, and with the help of stakeholder interviews I prioritized the issues that were the most important to solve. more
  • I focused the design on optimizing frequently repeated actions, and presenting detailed information about the document in a more visible way. more
  • The tool led to a 60-fold decrease in document processing time and an 80% reduction in necessary headcount. more

Context

Workfusion was contracted by a major international bank to automate invoice handling within their systems. The bank's existing process required a lot of manual work by junior business analysts, and though this process was outsourced overseas, it was still slow and costly. The client wanted to use Workfusion's automation tools to reduce headcount and improve processing speed.

At the time, Workfusion's tools required input data to be structured before it could be processed by the machine learning algorithm (such as a table, with defined rows and columns). The client's business processes produced only unstructured data (scanned PDF files); one of Workfusion's responsibilities would be to structure that data for the client.

A high-level diagram of Workfusion's basic process flow. For this project, we would need to add an additional step to convert data from PDFs into something that looked more like a CSV: a structured table of rows and columns.

Workfusion had previously developed a simple interface for extracting structured data from plain text. Workers using the tool could highlight text from a paragraph, and tag it by clicking a label or using a hotkey. We decided to use this tool as the basis of our Proof of Concept: we would set up a process inside Workfusion that would convert PDFs to text and present that text to crowd workers to be structured using the tagging interface.

The initial state of Information Extraction (InfEx), the foundation for Highlight. Tasks set up in InfEx could return one value for each tag that the task creator specified; effectively, each task produced one row in a table.

At the time of the project, the workflow logic necessary to make this tool serve our purpose could not be set up within the Workfusion UI. Our management decided that our developers would use custom code to manage the setup for these tasks during the Proof of Concept period. While these developers were focused on exposing the settings to make this setup easier, I proposed that the text extraction tool itself could be improved, in order to process more data at a time and obviate the need for complicated setup on the task publication side.

I took up the role of feature owner on Project Highlight. While a larger team improved the capabilities of the workflow design tool within Workfusion, I would be designing the new Highlight tool and managing two developers to implement the design. The project would be governed by the overall goals of the company and goals of the proof of concept project.

My roles

Discovery

The project needed to deliver outcomes to two audiences: both the client setting up the tasks, and the workers executing them. To ensure that neither audience was left out, I created a two-pronged strategy for user research.

Data requester
  • Receives documents from an external workflow. Decides which documents need to be processed.
  • Determines the flow of the business process and contents of worker tasks. Cares about the quality of the output data.
  • Will not be using Workfusion UIs during the Proof of Concept period.
Task worker
  • Receives tasks submitted by the data requester through Workfusion. Chooses tasks in the worker portal.
  • Tags the text in documents received based on instructions. Cares about speed of task completion.
  • Will be using the portal and Highlight during the Proof of Concept period.

We deployed the text extraction tool to a private instance of Workfusion's crowd computing platform. When the client submitted a document for extraction, the client's employees could log in to this platform and complete the extraction tasks. I took advantage of the platform's existing analytics features to track every worker's task completion time, accuracy, and other metrics. I also performed qualitative testing, setting up think-aloud tests with workers and interviewing them afterwards about their experiences with the tool.

Because the client did not yet have access to the workflow design interface, I could not directly observe them using it. Instead, I brought together the expertise of several colleagues working on the Proof of Concept:

After synthesizing my research findings through affinity diagramming, I identified two key problem areas that could be greatly improved with the resources available to Project Highlight.

The first set of issues stemmed from the inability of our legacy extraction tool to group extracted data together. The machine learning algorithm could not infer that labels in the same row of the table were semantically connected. The Proof of Concept workflow could only get around this problem in two steps: one worker would have to tag every row in the table as an individual entity, and then Workfusion would generate a task for tagging each row one at a time. This was a complicated flow to set up, and bugs in our backend would interrupt tasks from moving through it.

In addition to possible human error resulting in missing or duplicate rows, tasks would also sometimes get stuck, fail, or time out. Each added task increased the chance that the entire workflow would fail.

This was also a bad experience for the workers. A worker had to open a task with the entire source document, then scroll down to find the one row assigned to them. The necessary labels would go off-screen as the worker scrolled down, requiring the worker to memorize the labels and their hotkeys. Ultimately, the worker wasted a lot of time scrolling, leading to very slow progress on document extraction.

The necessity of breaking each task up into rows compounded the problems with the worker interface. Inefficiencies with tagging a single row multiplied with every row in the source document. A typical table with 4 columns and 50 rows might take 30 minutes to extract: a worker would have to wait for the system to generate 50 tasks, load each one in turn, struggle with the UI, then wait for the responses to be stitched together back into one document.

Sample breakdown of a typical document extracted through the original Workfusion process, with the time distribution of one task broken out. Total time elapsed: 30 minutes.

This was the best-case scenario. Often, the OCR would fail to correctly extract plain text from the input PDF file. Workers could not recover from this exception within the tool. They would have to mark the document's contents as corrupted, discharging the task, and revert to the client's existing, manual transcription tool.

Due to the frustrating experience of using our tool, many workers would mark correctly OCR'ed files as corrupted, so that they could go back to using their familiar program. 10% of documents were legitimately marked as corrupted, but an additional 15% were illegitimately marked. This meant that we would fail to process every fourth document submitted by the client, and users would need to manually merge its contents into their database.

Design Process

Based on the scenarios of use gathered from my research, I created sequence diagrams that identified all the necessary states within the workflow. I designed UIs for each state, first as wireframes to refine with internal subject matter experts, then as low-fidelity clickable mockups that I presented to internal stakeholders. I set two experience principles to guide the design of Highlight and explain my design decisions to stakeholders.

Visibility and flow of data was the first design principle. Since the goal of the project was to transform the data from a PDF into a standardized data store, it was only appropriate that the worker should have better visibility into both the initial and the desired state. The process would be resilient to failures of our backend to carry the data from one step to the next.

In order to eliminate the need for splitting tasks and improve the data flow, I designed a way for workers to define semantic relationships. The tool lets them quickly assign data from each row of the table into a separate group of tags. The panels scroll independently of one another, so workers would always see the tags that they are working on.

If the OCR malfunctioned, workers would be able to view the source PDF directly, and manually enter the information in a way that was similar to the tool they were used to. Finally, regardless of how they entered this data, their input would be normalized for categories such as country or currency, saving the client a lot of data cleaning work.

Mid-fidelity version of the Highlight UI, showing semantic grouping, original document view, normalized input, and independent content scrolling.

The second experience principle was efficiency. A lot of observed uses of the tool involved repeating actions, such as tagging each item in the Country column as the name of a country. Incorporating batch actions would drastically speed up document extraction and make it less tedious for workers to perform these tasks.

My collaboration with our data scientists led us to consider Highlight as an opportunity to test a new implementation of our machine learning algorithm that could run in the worker's browser, making suggestions. The worker could assume a direct supervisory role over the machine learning by deleting incorrect suggestions and accepting correct ones. In addition, I designed a batch highlighting functionality that would let users tag many cells of the table at once, reducing the time necessary for the machine to start making guesses.

After the worker batch-tags several rows, the machine learning immediately starts creating new tag groups and populating existing groups with new tags.

After scaling the scope down to a batch of work that could be implemented during the Proof of Concept period, I designed high-fidelity mockups for the reduced slice of the experience and recorded their requirements as JIRA stories.

Due to the limited number of developers available for implementing Highlight, I took on the role of a developer, writing front-end code alongside two engineers. I focused my efforts on interface animation and areas where multiple user flows collided, as well as User Acceptance Testing. I prioritized completing end-to-end user flows one at a time, which allowed me to frequently demo new features to internal stakeholders and quickly incorporate their feedback. This approach to feature development allowed us to deliver several iterations of Highlight throughout the Proof of Concept timeframe.

Outcomes

When all stakeholders were satisfied with testing results and stability, we reorganized the document processing workflow to use Highlight instead of the legacy tagging tool, and released it to the client's internal workforce. Workers were able to extract data much faster than before – a document that used to take 30 minutes with their original in-house method could be done in as little as 30 seconds when going through the Highlight workflow. Ultimately, this project led to an 80% reduction in full-time employees dealing with document extraction. Workers no longer felt the need to go off-tool due to the frustrating user experience, and were able to process all but the gnarliest PDFs within the Highlight tool.

The straightforward user experience also allowed Workfusion to trial Highlight for public crowd usage, where time spent learning a new tool can be costly for workers who are paid per completed task. With incremental adjustments, Highlight proved to be easy enough for these workers to use, and Workfusion fully retired the original tool.

Learning outcomes

Going into this project, I had no product management training. It was challenging at first to switch from higher-level value-focused descriptions and demonstrations of my designs to low-level, high-detail state by state requirements that developers needed in order to implement the functionality. Part of why I contributed to the coding of Highlight was because I had difficulty explaining some of the more complex interactions.

However, diving into that level of detail made it difficult for me to focus on the bigger picture, and my flows ended up not being fully thought out in several areas. Encountering these bugs during a demo to stakeholders was embarrassing, but I was able to use the opportunity to get another developer for my team, freeing me up to focus on the product manager's role.

I did find that editing the code myself was a very effective strategy for situations where the developers had nearly, but not completely, followed requirements. Closing that gap without needing to hold demos and meetings greatly accelerated the pace of our work.

I also started off on shaky ground when demoing our progress to senior stakeholders. The team received a mixture of feedback about bugs and requests for new features, and I found it hard to push back against my manager's demands. This led to several cycles where we could not quickly hold another demo, since our new code branch would introduce bugs from new features even as we fixed bugs in the old ones. Realizing that it would take time for us to meet their requirements, the stakeholders relented and gave us the room to adopt a release cadence where we could fix known bugs, and only then open a branch for new features.