A Benchmark for Structured Extractions from Complex Documents

Lee, Chen-Yu; Tata, Sandeep; Wang, Zilong; Wei, Wei; Zhou, Yichao

A Benchmark for Structured Extractions from Complex Documents

Authors: Chen-Yu Lee
Sandeep Tata
Zilong Wang
Wei Wei
Yichao Zhou
Publication date: 14 November 2022
Publisher

Abstract

Understanding visually-rich business documents to extract structured data and automate business workflows has been receiving attention both in academia and industry. Although recent multi-modal language models have achieved impressive results, we find that existing benchmarks do not reflect the complexity of real documents seen in industry. In this work, we identify the desiderata for a more comprehensive benchmark and propose one we call Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including diverse data types as well as nested entities, complex templates including tables and multi-column layouts, and diversity of different layouts (templates) within a single document type. We design few-shot and conventional experiment settings along with a carefully designed matching algorithm to evaluate extraction results. We report the performance of strong baselines and three observations: (1) generalizing to new document templates is very challenging, (2) few-shot performance has a lot of headroom, and (3) models struggle with nested fields such as line-items in an invoice. We plan to open source the benchmark and the evaluation toolkit. We hope this helps the community make progress on these challenging tasks in extracting structured data from visually rich documents

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.15421

Last time updated on 30/12/2022