1,942 research outputs found
Ontology-based Information Extraction with SOBA
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities
Schema-Driven Information Extraction from Heterogeneous Tables
In this paper, we explore the question of whether large language models can
support cost-efficient information extraction from tables. We introduce
schema-driven information extraction, a new task that transforms tabular data
into structured records following a human-authored schema. To assess various
LLM's capabilities on this task, we develop a benchmark composed of tables from
four diverse domains: machine learning papers, chemistry literature, material
science journals, and webpages. Alongside the benchmark, we present an
extraction method based on instruction-tuned LLMs. Our approach shows
competitive performance without task-specific labels, achieving F1 scores
ranging from 74.2 to 96.1, while maintaining great cost efficiency. Moreover,
we validate the possibility of distilling compact table-extraction models to
reduce API reliance, as well as extraction from image tables using multi-modal
models. By developing a benchmark and demonstrating the feasibility of this
task using proprietary models, we aim to support future work on open-source
schema-driven IE models
InfoSync: Information Synchronization across Multilingual Semi-structured Tables
Information Synchronization of semi-structured data across languages is
challenging. For instance, Wikipedia tables in one language should be
synchronized across languages. To address this problem, we introduce a new
dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync
contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages,
of which a subset (3.5K pairs) are manually annotated. The proposed method
includes 1) Information Alignment to map rows and 2) Information Update for
updating missing/outdated information for aligned tables across multilingual
tables. When evaluated on InfoSync, information alignment achieves an F1 score
of 87.91 (en non-en). To evaluate information updation, we perform
human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach
obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of
the proposed method.Comment: 22 pages, 7 figures, 20 tables, ACL 2023 (Toronto, Canada
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
Charts are common in literature across different scientific fields, conveying
rich information easily accessible to readers. Current chart-related tasks
focus on either chart perception which refers to extracting information from
the visual charts, or performing reasoning given the extracted data, e.g. in a
tabular form. In this paper, we aim to establish a unified and label-efficient
learning paradigm for joint perception and reasoning tasks, which can be
generally applicable to different downstream tasks, beyond the
question-answering task as specifically studied in peer works. Specifically,
StructChart first reformulates the chart information from the popular tubular
form (specifically linearized CSV) to the proposed Structured Triplet
Representations (STR), which is more friendly for reducing the task gap between
chart perception and reasoning due to the employed structured information
extraction for charts. We then propose a Structuring Chart-oriented
Representation Metric (SCRM) to quantitatively evaluate the performance for the
chart perception task. To enrich the dataset for training, we further explore
the possibility of leveraging the Large Language Model (LLM), enhancing the
chart diversity in terms of both chart visual style and its statistical
information. Extensive experiments are conducted on various chart-related
tasks, demonstrating the effectiveness and promising potential for a unified
chart perception-reasoning paradigm to push the frontier of chart
understanding.Comment: SimChart9K is available for downloading at:
https://github.com/UniModal4Reasoning/SimChart9K. 21 pages, 11 figure
Multimodal Document Analytics for Banking Process Automation
In response to growing FinTech competition and the need for improved
operational efficiency, this research focuses on understanding the potential of
advanced document analytics, particularly using multimodal models, in banking
processes. We perform a comprehensive analysis of the diverse banking document
landscape, highlighting the opportunities for efficiency gains through
automation and advanced analytics techniques in the customer business. Building
on the rapidly evolving field of natural language processing (NLP), we
illustrate the potential of models such as LayoutXLM, a cross-lingual,
multimodal, pre-trained model, for analyzing diverse documents in the banking
sector. This model performs a text token classification on German company
register extracts with an overall F1 score performance of around 80\%. Our
empirical evidence confirms the critical role of layout information in
improving model performance and further underscores the benefits of integrating
image information. Interestingly, our study shows that over 75% F1 score can be
achieved with only 30% of the training data, demonstrating the efficiency of
LayoutXLM. Through addressing state-of-the-art document analysis frameworks,
our study aims to enhance process efficiency and demonstrate the real-world
applicability and benefits of multimodal models within banking.Comment: A Preprin
- …