Search CORE

857 research outputs found

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Author: Chan Gromit Yeuk-Yin
Giles Clyde Lee
Hsu Ting-Yao
Huang Chieh-Yang
Huang Ting-Hao 'Kenneth'
Kim Sungchul
Koh Eunyee
Nenkova Ani
Rossi Ryan
Publication venue
Publication date: 11/08/2023
Field of study

Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEGASUS, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs (e.g., "Figure 3 shows...") into figure captions. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations. We further conducted an in-depth investigation focused on two key challenges: (i) the common presence of low-quality author-written captions and (ii) the lack of clear standards for good captions. Our code and data are available at: https://github.com/Crowd-AI-Lab/Generating-Figure-Captions-as-a-Text-Summarization-Task.Comment: Accepted by INLG-202

arXiv.org e-Print Archive

Multimedia information technology and the annotation of video

Author: Jong F.M.G. de
Smeulders A.
Worring M.
Publication venue: Stichting Archiefpublicaties
Publication date: 01/01/2006
Field of study

The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

University of Twente Research Information

Notes from the Field: The Role of Datasets in Transitional Justice Research: The Case of Brazilian Truth Commission

Author: Cesar Roberto M., Jr.
Mezarobba Glenda
Publication venue: Scholarship@Western
Publication date: 01/01/2016
Field of study

In 2012, Brazilian President Dilma Roussef installed the Brazilian Truth Commission (CNV) to address gross human rights violations that occurred from 1946-1988.One of the most important sources of information available regarding this period is the files of the agencies that comprised the Brazilian intelligence system during the dictatorship. In total, there were around 12 million pages of relevant text in the National Archives. To make effective use of this trove of information, the CNV was challenged to use some data science tools to look for useful information within this huge dataset. As a result, a prototype of a data repository with selected documents (pdfs, images, etc.) has been created, which we summarize in this note. Computational tools for searching, organizing, and visualizing potentially important documents were developed and utilized to support CNV researchers. We also reflect upon the issues that complicated the CNV’s ability to gain access to reliable and comprehensive data and the limitations of analysis conducted with this type of research

Scholarship@Western

Crossref

Applications of integration of AI-based Optical Character Recognition (OCR) and Generative AI in Document Understanding and Processing

Author: Abdelaziz Tarek Ahmed Ibrahim
Fazil Urfa
Publication venue: ResearchBerg
Publication date: 08/11/2023
Field of study

The adoption of AI-based Optical Character Recognition (OCR) and Generative AI can streamline document processing, shifting from manual to automated digital methods, thus increasing efficiency and accuracy in data handling. This study examines the applications of these technologies across various stages of document management. Initially, OCR technology can scan and digitize physical documents, transforming text images into machine-encoded text. This process is essential for converting paper-based records into digital formats. Additionally, OCR can decipher handwritten notes, making it invaluable for processing historical documents and manually filled forms. In the subsequent phase, these technologies can categorize and organize data. AI algorithms, combined with OCR, can classify text into various categories such as invoices, legal documents, or personal letters, thereby streamlining document sorting and retrieval. Generative AI can further enhance this process by producing concise summaries of lengthy documents, enabling quick comprehension without the need to read the entire text. Error detection and correction are also critical areas where these technologies can be applied. Despite its effectiveness, OCR may misinterpret characters, and AI algorithms can identify these errors by comparing the scanned text against language models. Generative AI can then suggest corrections, improving the accuracy of the digitized text. Moreover, the combination of OCR and Generative AI can be employed for data extraction and analysis, extracting specific information from documents, and conducting sentiment analysis on texts like customer reviews to gain insights into customer opinions. In terms of language translation and localization, Generative AI can translate digitized text into various languages and adapt content for different cultural contexts, crucial for international businesses. Document accessibility is enhanced as AI can convert text to speech and introduce interactive elements, making documents accessible to visually impaired users. Furthermore, in ensuring security and compliance, these technologies can identify and redact sensitive information to comply with privacy laws and verify the authenticity of documents to detect alterations. Finally, AI can generate customizable document templates and content, tailoring documents to specific needs and preferences, demonstrating the extensive impact of AI-based OCR and Generative AI in modern document processing and management

ResearchBerg