45 research outputs found
Query by String word spotting based on character bi-gram indexing
In this paper we propose a segmentation-free query by string word spotting
method. Both the documents and query strings are encoded using a recently
proposed word representa- tion that projects images and strings into a common
atribute space based on a pyramidal histogram of characters(PHOC). These
attribute models are learned using linear SVMs over the Fisher Vector
representation of the images along with the PHOC labels of the corresponding
strings. In order to search through the whole page, document regions are
indexed per character bi- gram using a similar attribute representation. On top
of that, we propose an integral image representation of the document using a
simplified version of the attribute model for efficient computation. Finally we
introduce a re-ranking step in order to boost retrieval performance. We show
state-of-the-art results for segmentation-free query by string word spotting in
single-writer and multi-writer standard datasetsComment: To be published in ICDAR201
Sequence Mining and Pattern Analysis in Drilling Reports with Deep Natural Language Processing
Drilling activities in the oil and gas industry have been reported over
decades for thousands of wells on a daily basis, yet the analysis of this text
at large-scale for information retrieval, sequence mining, and pattern analysis
is very challenging. Drilling reports contain interpretations written by
drillers from noting measurements in downhole sensors and surface equipment,
and can be used for operation optimization and accident mitigation. In this
initial work, a methodology is proposed for automatic classification of
sentences written in drilling reports into three relevant labels (EVENT,
SYMPTOM and ACTION) for hundreds of wells in an actual field. Some of the main
challenges in the text corpus were overcome, which include the high frequency
of technical symbols, mistyping/abbreviation of technical terms, and the
presence of incomplete sentences in the drilling reports. We obtain
state-of-the-art classification accuracy within this technical language and
illustrate advanced queries enabled by the tool.Comment: 7 pages, 14 figures, technical repor