14,813 research outputs found

    Learning Object Categories From Internet Image Searches

    Get PDF
    In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models “on-the-fly.” We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets

    A Convex Model for Edge-Histogram Specification with Applications to Edge-preserving Smoothing

    Full text link
    The goal of edge-histogram specification is to find an image whose edge image has a histogram that matches a given edge-histogram as much as possible. Mignotte has proposed a non-convex model for the problem [M. Mignotte. An energy-based model for the image edge-histogram specification problem. IEEE Transactions on Image Processing, 21(1):379--386, 2012]. In his work, edge magnitudes of an input image are first modified by histogram specification to match the given edge-histogram. Then, a non-convex model is minimized to find an output image whose edge-histogram matches the modified edge-histogram. The non-convexity of the model hinders the computations and the inclusion of useful constraints such as the dynamic range constraint. In this paper, instead of considering edge magnitudes, we directly consider the image gradients and propose a convex model based on them. Furthermore, we include additional constraints in our model based on different applications. The convexity of our model allows us to compute the output image efficiently using either Alternating Direction Method of Multipliers or Fast Iterative Shrinkage-Thresholding Algorithm. We consider several applications in edge-preserving smoothing including image abstraction, edge extraction, details exaggeration, and documents scan-through removal. Numerical results are given to illustrate that our method successfully produces decent results efficiently

    Finding Images of Rare and Ambiguous Entities

    No full text

    Aprendizado de máquina aplicado para melhorar a acessibilidade de documentos PDF para usuários com deficiência visual

    Get PDF
    Orientador: Luiz Cesar MartiniDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Os documentos digitais são acessados por pessoas com deficiência visual (VIP) por meio de leitores de tela. Tradicionalmente, os documentos digitais eram traduzidos para texto em braille, mas os leitores de tela provaram ser eficientes para a aquisição de conhecimento para as VIP. No entanto, os leitores de tela e outras tecnologias assistivas têm limitações significativas quando existem tabelas em documentos digitais como os documentos PDF (Portable Document Format). Por exemplo, os leitores de tela não podem seguir a sequência de leitura correta da tabela com base em sua estrutura visual causando que esse conteúdo seja inacessível aos VIP. Para lidar com esse problema, neste trabalho, desenvolvemos um sistema para a recuperação de informações de tabela de documentos PDF para uso em leitores de tela usados por pessoas com deficiência visual. A metodologia proposta aproveita as técnicas de visão computacional com uma abordagem de aprendizado profundo para tornar os documentos acessíveis em vez da abordagem clássica de programação baseada em regras. Explicamos em detalhe a metodologia que usamos e como avaliar objetivamente a abordagem por meio de métricas de entropia, ganho de informação e pureza. Os resultados mostram que nossa metodologia proposta pode ser usada para reduzir a incerteza experimentada por pessoas com deficiência visual ao ouvir o conteúdo das tabelas em documentos digitais através de leitores de tela. Nosso sistema de recuperação de informações de tabela apresenta duas melhorias em comparação com as abordagens tradicionais de marcação de arquivos PDF. Primeiro, nossa abordagem não requer supervisão de pessoas com visão. Segundo, nosso sistema é capaz de trabalhar com PDFs baseados em imagem e em textoAbstract: Digital documents are accessed by visually impaired people (VIP) through screen readers. Traditionally, digital documents were translated to braille text, but screen readers have proved to be efficient for the acquisition of digital document knowledge by VIP. However, screen readers and other assistive technologies have significant limitations when there exist tables in digital documents such as portable document format (PDF). For instance, screen readers can not follow the correct reading sequence of the table based on its visual structure causing this content is inaccessible for VIP. In order to deal with this problem, in this work, we developed a system for the retrieval of table information from PDF documents for use in screen readers used by visually impaired people. The proposed methodology takes advantage of computer vision techniques with a deep learning approach to make documents accessible instead of the classical rule-based programming approach. We explained in detail the methodology that we used and how to objectively evaluate the approach through entropy, information gain, and purity metrics. The results show that our proposed methodology can be used to reduce the uncertainty experienced by visually impaired people when listening to the contents of tables in digital documents through screen readers. Our table information retrieval system presents two improvements compared with traditional approaches of tagging text-based PDF files. First, our approach does not require supervision by sighted people. Second, our system is capable of working with image-based as well as text-based PDFsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Saliency for Image Description and Retrieval

    Get PDF
    We live in a world where we are surrounded by ever increasing numbers of images. More often than not, these images have very little metadata by which they can be indexed and searched. In order to avoid information overload, techniques need to be developed to enable these image collections to be searched by their content. Much of the previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This thesis initially discusses how this problem can be circumvented by using salient interest regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The thesis discusses a number of different saliency detectors that are suitable for robust retrieval purposes and performs a comparison between a number of these region detectors. The thesis then discusses how salient regions can be used for image retrieval using a number of techniques, but most importantly, two techniques inspired from the field of textual information retrieval. Using these robust retrieval techniques, a new paradigm in image retrieval is discussed, whereby the retrieval takes place on a mobile device using a query image captured by a built-in camera. This paradigm is demonstrated in the context of an art gallery, in which the device can be used to find more information about particular images. The final chapter of the thesis discusses some approaches to bridging the semantic gap in image retrieval. The chapter explores ways in which un-annotated image collections can be searched by keyword. Two techniques are discussed; the first explicitly attempts to automatically annotate the un-annotated images so that the automatically applied annotations can be used for searching. The second approach does not try to explicitly annotate images, but rather, through the use of linear algebra, it attempts to create a semantic space in which images and keywords are positioned such that images are close to the keywords that represent them within the space
    corecore