Search CORE

4 research outputs found

Recommended from our members

Use of a Fast Information Extraction Method as a Decision Support Tool

Author: Conlon Sumali
Sheikh Mahmudul
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2010
Field of study

Ad-hoc extraction of information from documents can ensure the transparency of decisions made by an organization. Different Information Extraction methods have been applied to extract information from various domains. Most widely known methods use manually annotated training documents that require high development time. The automated training methods are not scalable to large application domains. We have developed a semi-automated knowledge-engineering method for building the knowledge-base with minimal efforts. Because our method reduces manual processing of the training data, the development process is very fast. We have developed a prototype application to extract information from the project-reports of the American Recovery and Reinvestment Act (ARRA) of 2009. The fast development process of our system, its scalability to large application domains, and its high extraction effectiveness will help the transparency of management decisions by extracting and mining relevant information

CSUSB ScholarWorks

TEXT SUMMARIZATION UNDER LOW SUPERVISION

Author: Zhao Chao
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2023
Field of study

Text summarization aims to create a concise and fluent summary that captures the most salient information from a given document(s). However, most summarization methods require large-scale document-summary pairs as the training data, which is laborious to acquire for many domains. This calls for the development of summarization algorithms that can work in a low-supervision setting, which is still a challenging problem. In this dissertation, we address the problem from three perspectives. We start by improving the summarization methods using external information. Specifically, we focus on the task of product review summarization. We utilize the feature descriptions of the product as external information to better guide the model to identify aspect-related information from reviews and create corresponding summaries. Besides the use of external information, we also explore the use of external models, and propose a method that enables knowledge transfer from single-document summarization (SDS) to multi-document summarization (MDS). Our approach involves an efficient and effective technique of multiple document reordering, which facilitates both unsupervised and supervised MDS. In the third part, we present novel approaches to automatically construct high-quality paired training data for summarization. In particular, we introduce two large-scale datasets: Diana for dialogue summarization and NarraSum for narrative summarization. We experimentally demonstrate that pre-training on these datasets significantly improves summarization quality. Finally, given that the primary objective of summarization is to help users better grasp key information and understand the document, we investigate the potential of utilizing automatically constructed summarization datasets to enhance reading comprehension in a zero-shot manner. We propose Parrot, a zero-shot approach that leverages document-summary pairs for reading comprehension. Our results demonstrate that Parrot outperforms previous zero-shot approaches and achieves comparable performance to fully supervised models, showcasing how text summarization can facilitate reading comprehension with minimal supervision.Doctor of Philosoph

Carolina Digital Repository

Statistical Sentence Extraction for Information Distillation

Author: Dilek Hakkani-tür
Gokhan Tur
Publication venue
Publication date: 01/01/2007
Field of study

Information distillation aims to extract the most useful pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. One critical componentin a distillation engine is detecting sentences to be extracted from each relevant document. In this paper, we present a statistical sentence extraction approach for distillation. Basically, we frame this task as a classi�cation problem, where each candidate sentence in documents is classi�ed as relevant to the query or not. These documents may be in textual or audio format and in a number of languages. For audio documents, we use both manual and automatic transcriptions, for non-English documents, we use automatic translations. In this work, we use AdaBoost, a discriminative classi�cation method with both lexical and semantic features. The results indicate 11%-13 % relative improvement over a baseline keyword-spotting-based approach. We also show the robustness of our method on the audio subset of the document sources using manual and automatic transcriptions. Index Terms — information distillation, information extraction, language understanding, speech processing, natural language processin

CiteSeerX

Crossref