2 research outputs found
Language Independent Acquisition of Abbreviations
This paper addresses automatic extraction of abbreviations (encompassing
acronyms and initialisms) and corresponding long-form expansions from plain
unstructured text. We create and are going to release a multilingual resource
for abbreviations and their corresponding expansions, built automatically by
exploiting Wikipedia redirect and disambiguation pages, that can be used as a
benchmark for evaluation. We address a shortcoming of previous work where only
the redirect pages were used, and so every abbreviation had only a single
expansion, even though multiple different expansions are possible for many of
the abbreviations. We also develop a principled machine learning based approach
to scoring expansion candidates using different techniques such as indicators
of near synonymy, topical relatedness, and surface similarity. We show improved
performance over seven languages, including two with a non-Latin alphabet,
relative to strong baselines.Comment: 9 pages, 7 figues, 2 table
Understanding Scanned Receipts
Tasking machines with understanding receipts can have important applications
such as enabling detailed analytics on purchases, enforcing expense policies,
and inferring patterns of purchase behavior on large collections of receipts.
In this paper, we focus on the task of Named Entity Linking (NEL) of scanned
receipt line items; specifically, the task entails associating shorthand text
from OCR'd receipts with a knowledge base (KB) of grocery products. For
example, the scanned item "STO BABY SPINACH" should be linked to the catalog
item labeled "Simple Truth Organic Baby Spinach". Experiments that employ a
variety of Information Retrieval techniques in combination with statistical
phrase detection shows promise for effective understanding of scanned receipt
data.Comment: 8 pages, 3 figures, no conference submissio