Search CORE

758 research outputs found

Attribute CNNs for Word Spotting in Handwritten Documents

Author: Fink Gernot
Sudholt Sebastian
Publication venue
Publication date: 20/12/2017
Field of study

Word spotting has become a field of strong research interest in document image analysis over the last years. Recently, AttributeSVMs were proposed which predict a binary attribute representation. At their time, this influential method defined the state-of-the-art in segmentation-based word spotting. In this work, we present an approach for learning attribute representations with Convolutional Neural Networks (CNNs). By taking a probabilistic perspective on training CNNs, we derive two different loss functions for binary and real-valued word string embeddings. In addition, we propose two different CNN architectures, specifically designed for word spotting. These architectures are able to be trained in an end-to-end fashion. In a number of experiments, we investigate the influence of different word string embeddings and optimization strategies. We show our Attribute CNNs to achieve state-of-the-art results for segmentation-based word spotting on a large variety of data sets.Comment: under review at IJDA

arXiv.org e-Print Archive

A Computationally Efficient Pipeline Approach to Full Page Offline Handwritten Text Recognition

Author: Chung Jonathan
Delteil Thomas
Publication venue
Publication date: 08/05/2020
Field of study

Offline handwriting recognition with deep neural networks is usually limited to words or lines due to large computational costs. In this paper, a less computationally expensive full page offline handwritten text recognition framework is introduced. This framework includes a pipeline that locates handwritten text with an object detection neural network and recognises the text within the detected regions using features extracted with a multi-scale convolutional neural network (CNN) fed into a bidirectional long short term memory (LSTM) network. This framework achieves comparable error rates to state of the art frameworks while using less memory and time. The results in this paper demonstrate the potential of this framework and future work can investigate production ready and deployable handwritten text recognisers

arXiv.org e-Print Archive

Annotation-free Learning of Deep Representations for Word Spotting using Synthetic Data and Self Labeling

Author: Fink Gernot A.
Wolf Fabian
Publication venue
Publication date: 25/05/2020
Field of study

Word spotting is a popular tool for supporting the first exploration of historic, handwritten document collections. Today, the best performing methods rely on machine learning techniques, which require a high amount of annotated training material. As training data is usually not available in the application scenario, annotation-free methods aim at solving the retrieval task without representative training samples. In this work, we present an annotation-free method that still employs machine learning techniques and therefore outperforms other learning-free approaches. The weakly supervised training scheme relies on a lexicon, that does not need to precisely fit the dataset. In combination with a confidence based selection of pseudo-labeled training samples, we achieve state-of-the-art query-by-example performances. Furthermore, our method allows to perform query-by-string, which is usually not the case for other annotation-free methods.Comment: Accepted to Workshop on Document Analysis Systems (DAS) 202

arXiv.org e-Print Archive

R-PHOC: Segmentation-Free Word Spotting using CNN

Author: Ghosh Suman
Valveny Ernest
Publication venue
Publication date: 05/07/2017
Field of study

This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our net- work takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting.Comment: Accepted in ICDAR'201

arXiv.org e-Print Archive

Depression Severity Estimation from Multiple Modalities

Author: Chowdhury Shammur Absar
Ghosh Arindam
Lathuiliere Stephane
Riccardi Giuseppe
Sebe Nicu
Stepanov Evgeny
Vieriu Radu-Laurentiu
Publication venue
Publication date: 10/11/2017
Field of study

Depression is a major debilitating disorder which can affect people from all ages. With a continuous increase in the number of annual cases of depression, there is a need to develop automatic techniques for the detection of the presence and extent of depression. In this AVEC challenge we explore different modalities (speech, language and visual features extracted from face) to design and develop automatic methods for the detection of depression. In psychology literature, the PHQ-8 questionnaire is well established as a tool for measuring the severity of depression. In this paper we aim to automatically predict the PHQ-8 scores from features extracted from the different modalities. We show that visual features extracted from facial landmarks obtain the best performance in terms of estimating the PHQ-8 results with a mean absolute error (MAE) of 4.66 on the development set. Behavioral characteristics from speech provide an MAE of 4.73. Language features yield a slightly higher MAE of 5.17. When switching to the test set, our Turn Features derived from audio transcriptions achieve the best performance, scoring an MAE of 4.11 (corresponding to an RMSE of 4.94), which makes our system the winner of the AVEC 2017 depression sub-challenge.Comment: 8 pages, 1 figur

arXiv.org e-Print Archive

PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents

Author: Fink Gernot A.
Sudholt Sebastian
Publication venue
Publication date: 05/12/2017
Field of study

In recent years, deep convolutional neural networks have achieved state of the art performance in various computer vision task such as classification, detection or segmentation. Due to their outstanding performance, CNNs are more and more used in the field of document image analysis as well. In this work, we present a CNN architecture that is trained with the recently proposed PHOC representation. We show empirically that our CNN architecture is able to outperform state of the art results for various word spotting benchmarks while exhibiting short training and test times.Comment: published as conference paper at the International Conference on Frontiers in Handwriting Recognition 201

arXiv.org e-Print Archive

WSRNet: Joint Spotting and Recognition of Handwritten Words

Author: Maragos Petros
Retsinas George
Sfikas Giorgos
Publication venue
Publication date: 17/08/2020
Field of study

In this work, we present a unified model that can handle both Keyword Spotting and Word Recognition with the same network architecture. The proposed network is comprised of a non-recurrent CTC branch and a Seq2Seq branch that is further augmented with an Autoencoding module. The related joint loss leads to a boost in recognition performance, while the Seq2Seq branch is used to create efficient word representations. We show how to further process these representations with binarization and a retraining scheme to provide compact and highly efficient descriptors, suitable for keyword spotting. Numerical results validate the usefulness of the proposed architecture, as our method outperforms the previous state-of-the-art in keyword spotting, and provides results in the ballpark of the leading methods for word recognition

arXiv.org e-Print Archive

Neural Word Search in Historical Manuscript Collections

Author: Brun Anders
Lindström Jonas
Wilkinson Tomas
Publication venue
Publication date: 31/03/2020
Field of study

We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first propose an end-to-end trainable model based on deep neural networks that we dub Ctrl-F-Net. The model simultaneously generates region proposals and embeds them into a word embedding space, wherein a search is performed. We further introduce a simplified version called Ctrl-F-Mini. It is faster with similar performance, though it is limited to more easily segmented manuscripts. We evaluate both models on common benchmark datasets and surpass the previous state of the art. Finally, in collaboration with historians, we employ the Ctrl-F-Net to search within a large manuscript collection of over 100 thousand pages, written across two centuries. With only 11 training pages, we enable large scale data collection in manuscript-based historical research. This results in a speed up of data collection and the number of manuscripts processed by orders of magnitude. Given the time consuming manual work required to study old manuscripts in the humanities, quick and robust tools for word spotting has the potential to revolutionise domains like history, religion and language.Comment: Extension of arXiv:1703.07645. This version adds results on two additional benchmark datasets (Botany and Konzilsprotokolle) and improves the experiment done in section 5.3.

arXiv.org e-Print Archive

Learning Deep Representations for Word Spotting Under Weak Supervision

Author: Fink Gernot A.
Gurjar Neha
Sudholt Sebastian
Publication venue
Publication date: 26/01/2018
Field of study

Convolutional Neural Networks have made their mark in various fields of computer vision in recent years. They have achieved state-of-the-art performance in the field of document analysis as well. However, CNNs require a large amount of annotated training data and, hence, great manual effort. In our approach, we introduce a method to drastically reduce the manual annotation effort while retaining the high performance of a CNN for word spotting in handwritten documents. The model is learned with weak supervision using a combination of synthetically generated training data and a small subset of the training partition of the handwritten data set. We show that the network achieves results highly competitive to the state-of-the-art in word spotting with shorter training times and a fraction of the annotation effort.Comment: submitted to DAS 201

arXiv.org e-Print Archive

Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture

Author: Fornés Alicia
Kang Lei
Riba Pau
Rusiñol Marçal
Villegas Mauricio
Publication venue
Publication date: 21/12/2019
Field of study

Sequence-to-sequence models have recently become very popular for tackling handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging problem. The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system. Thus, the bias between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this work, we introduce Candidate Fusion, a novel way to integrate an external language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two improvements. On the one hand, the sequence-to-sequence recognizer has the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided by the language model. On the other hand, the external language model has the ability to adapt itself to the training corpus and even learn the most commonly errors produced from the recognizer. Finally, by conducting comprehensive experiments, the Candidate Fusion proves to outperform the state-of-the-art language models for handwritten word recognition tasks

arXiv.org e-Print Archive