2,582 research outputs found

    Query by String word spotting based on character bi-gram indexing

    Full text link
    In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasetsComment: To be published in ICDAR201

    Colour Text Segmentation in Web Images Based on Human Perception

    No full text
    There is a significant need to extract and analyse the text in images on Web documents, for effective indexing, semantic analysis and even presentation by non-visual means (e.g., audio). This paper argues that the challenging segmentation stage for such images benefits from a human perspective of colour perception in preference to RGB colour space analysis. The proposed approach enables the segmentation of text in complex situations such as in the presence of varying colour and texture (characters and background). More precisely, characters are segmented as distinct regions with separate chromaticity and/or lightness by performing a layer decomposition of the image. The method described here is a result of the authors’ systematic approach to approximate the human colour perception characteristics for the identification of character regions. In this instance, the image is decomposed by performing histogram analysis of Hue and Lightness in the HLS colour space and merging using information on human discrimination of wavelength and luminance

    Visual Representation of Text in Web Documents and Its Interpretation

    No full text
    This paper examines the uses of text and its representation on Web documents in terms of the challenges in its interpretation. Particular attention is paid to the significant problem of non-uniform representation of text. This non-uniformity is mainly due to the presence of semantically important text in image form as opposed to the standard encoded text. The issues surrounding text representation in Web documents are discussed in the context of colour perception and spatial representation. The characteristics of the representation of text in image form are examined and research towards interpreting these images of text is briefly described

    Visual Representation of Text in Web Documents and Its Interpretation

    No full text
    This paper examines the uses of text and its representation on Web documents in terms of the challenges in its interpretation. Particular attention is paid to the significant problem of non-uniform representation of text. This non-uniformity is mainly due to the presence of semantically important text in image form as opposed to the standard encoded text. The issues surrounding text representation in Web documents are discussed in the context of colour perception and spatial representation. The characteristics of the representation of text in image form are examined and research towards interpreting these images of text is briefly described

    Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

    Full text link
    This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs). In object and scene analysis, deep neural nets are capable of learning a hierarchical chain of abstraction from pixel inputs to concise and descriptive representations. The current work explores this capacity in the realm of document analysis, and confirms that this representation strategy is superior to a variety of popular hand-crafted alternatives. Experiments also show that (i) features extracted from CNNs are robust to compression, (ii) CNNs trained on non-document images transfer well to document analysis tasks, and (iii) enforcing region-specific feature-learning is unnecessary given sufficient training data. This work also makes available a new labelled subset of the IIT-CDIP collection, containing 400,000 document images across 16 categories, useful for training new CNNs for document analysis

    Smartphone picture organization: a hierarchical approach

    Get PDF
    We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization.Peer ReviewedPreprin

    Information extraction from multimedia web documents: an open-source platform and testbed

    No full text
    The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
    corecore