2 research outputs found
Acronym-Meaning Extraction from Corpora Using Multi-Tape Weighted Finite-State Machines
The automatic extraction of acronyms and their meaning from corpora is an
important sub-task of text mining. It can be seen as a special case of string
alignment, where a text chunk is aligned with an acronym. Alternative
alignments have different cost, and ideally the least costly one should give
the correct meaning of the acronym. We show how this approach can be
implemented by means of a 3-tape weighted finite-state machine (3-WFSM) which
reads a text chunk on tape 1 and an acronym on tape 2, and generates all
alternative alignments on tape 3. The 3-WFSM can be automatically generated
from a simple regular expression. No additional algorithms are required at any
stage. Our 3-WFSM has a size of 27 states and 64 transitions, and finds the
best analysis of an acronym in a few milliseconds.Comment: 6 pages, LaTe