Search CORE

2 research outputs found

Language Independent Acquisition of Abbreviations

Author: Chowdhury Md Faisal Mahbub
Glass Michael R.
Gliozzo Alfio M.
Publication venue
Publication date: 23/09/2017
Field of study

This paper addresses automatic extraction of abbreviations (encompassing acronyms and initialisms) and corresponding long-form expansions from plain unstructured text. We create and are going to release a multilingual resource for abbreviations and their corresponding expansions, built automatically by exploiting Wikipedia redirect and disambiguation pages, that can be used as a benchmark for evaluation. We address a shortcoming of previous work where only the redirect pages were used, and so every abbreviation had only a single expansion, even though multiple different expansions are possible for many of the abbreviations. We also develop a principled machine learning based approach to scoring expansion candidates using different techniques such as indicators of near synonymy, topical relatedness, and surface similarity. We show improved performance over seven languages, including two with a non-Latin alphabet, relative to strong baselines.Comment: 9 pages, 7 figues, 2 table

arXiv.org e-Print Archive

Understanding Scanned Receipts

Author: Melz Eric
Publication venue
Publication date: 04/05/2020
Field of study

Tasking machines with understanding receipts can have important applications such as enabling detailed analytics on purchases, enforcing expense policies, and inferring patterns of purchase behavior on large collections of receipts. In this paper, we focus on the task of Named Entity Linking (NEL) of scanned receipt line items; specifically, the task entails associating shorthand text from OCR'd receipts with a knowledge base (KB) of grocery products. For example, the scanned item "STO BABY SPINACH" should be linked to the catalog item labeled "Simple Truth Organic Baby Spinach". Experiments that employ a variety of Information Retrieval techniques in combination with statistical phrase detection shows promise for effective understanding of scanned receipt data.Comment: 8 pages, 3 figures, no conference submissio

arXiv.org e-Print Archive