Search CORE

1 research outputs found

MicroConceptBERT: concept-relation based document information extraction framework.

Author: Nanayakkara Gayani
Silva Kanishka
Silva Thushari
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 31/12/2023
Field of study

Extracting information from documents is a crucial task in natural language processing research. Existing information extraction methodologies often focus on specific domains, such as medicine, education or finance, and are limited by language constraints. However, more comprehensive approaches that transcend document types, languages, contexts, and structures would significantly advance the field proposed in recent research. This study addresses this challenge by introducing microConceptBERT: a concept-relations-based framework for document information extraction, which offers flexibility for various document processing tasks while accounting for hierarchical, semantic, and heuristic features. The proposed framework has been applied to a question-answering task on benchmark datasets: SQUAD 2.0 and DOCVQA. Notably, the F1 evaluation metric attains an outperforming 87.01 performance rate on the SQUAD 2.0 dataset compared to baseline models: BERT-base and BERT-large models

Open Access Institutional Repository at Robert Gordon University