58 research outputs found
Recuperação de informação multimodal em repositórios de imagem médica
The proliferation of digital medical imaging modalities in hospitals and other
diagnostic facilities has created huge repositories of valuable data, often
not fully explored. Moreover, the past few years show a growing trend
of data production. As such, studying new ways to index, process and
retrieve medical images becomes an important subject to be addressed by
the wider community of radiologists, scientists and engineers. Content-based
image retrieval, which encompasses various methods, can exploit the visual
information of a medical imaging archive, and is known to be beneficial to
practitioners and researchers. However, the integration of the latest systems
for medical image retrieval into clinical workflows is still rare, and their
effectiveness still show room for improvement.
This thesis proposes solutions and methods for multimodal information
retrieval, in the context of medical imaging repositories. The major
contributions are a search engine for medical imaging studies supporting
multimodal queries in an extensible archive; a framework for automated
labeling of medical images for content discovery; and an assessment and
proposal of feature learning techniques for concept detection from medical
images, exhibiting greater potential than feature extraction algorithms that
were pertinently used in similar tasks. These contributions, each in their
own dimension, seek to narrow the scientific and technical gap towards
the development and adoption of novel multimodal medical image retrieval
systems, to ultimately become part of the workflows of medical practitioners,
teachers, and researchers in healthcare.A proliferação de modalidades de imagem médica digital, em hospitais,
clínicas e outros centros de diagnóstico, levou à criação de enormes
repositórios de dados, frequentemente não explorados na sua totalidade.
Além disso, os últimos anos revelam, claramente, uma tendência para o
crescimento da produção de dados. Portanto, torna-se importante estudar
novas maneiras de indexar, processar e recuperar imagens médicas, por
parte da comunidade alargada de radiologistas, cientistas e engenheiros. A
recuperação de imagens baseada em conteúdo, que envolve uma grande
variedade de métodos, permite a exploração da informação visual num
arquivo de imagem médica, o que traz benefícios para os médicos e
investigadores. Contudo, a integração destas soluções nos fluxos de trabalho
é ainda rara e a eficácia dos mais recentes sistemas de recuperação de
imagem médica pode ser melhorada.
A presente tese propõe soluções e métodos para recuperação de informação
multimodal, no contexto de repositórios de imagem médica. As contribuições
principais são as seguintes: um motor de pesquisa para estudos de imagem
médica com suporte a pesquisas multimodais num arquivo extensível; uma
estrutura para a anotação automática de imagens; e uma avaliação e
proposta de técnicas de representation learning para deteção automática de
conceitos em imagens médicas, exibindo maior potencial do que as técnicas
de extração de features visuais outrora pertinentes em tarefas semelhantes.
Estas contribuições procuram reduzir as dificuldades técnicas e científicas
para o desenvolvimento e adoção de sistemas modernos de recuperação de
imagem médica multimodal, de modo a que estes façam finalmente parte
das ferramentas típicas dos profissionais, professores e investigadores da área
da saúde.Programa Doutoral em Informátic
A Location-Aware Middleware Framework for Collaborative Visual Information Discovery and Retrieval
This work addresses the problem of scalable location-aware distributed indexing to enable the leveraging of collaborative effort for the construction and maintenance of world-scale visual maps and models which could support numerous activities including navigation, visual localization, persistent surveillance, structure from motion, and hazard or disaster detection. Current distributed approaches to mapping and modeling fail to incorporate global geospatial addressing and are limited in their functionality to customize search. Our solution is a peer-to-peer middleware framework based on XOR distance routing which employs a Hilbert Space curve addressing scheme in a novel distributed geographic index. This allows for a universal addressing scheme supporting publish and search in dynamic environments while ensuring global availability of the model and scalability with respect to geographic size and number of users. The framework is evaluated using large-scale network simulations and a search application that supports visual navigation in real-world experiments
Information fusion in content based image retrieval: A comprehensive overview
An ever increasing part of communication between persons involve the use of pictures, due to the cheap availability of powerful cameras on smartphones, and the cheap availability of storage space. The rising popularity of social networking applications such as Facebook, Twitter, Instagram, and of instant messaging applications, such as WhatsApp, WeChat, is the clear evidence of this phenomenon, due to the opportunity of sharing in real-time a pictorial representation of the context each individual is living in. The media rapidly exploited this phenomenon, using the same channel, either to publish their reports, or to gather additional information on an event through the community of users. While the real-time use of images is managed through metadata associated with the image (i.e., the timestamp, the geolocation, tags, etc.), their retrieval from an archive might be far from trivial, as an image bears a rich semantic content that goes beyond the description provided by its metadata. It turns out that after more than 20 years of research on Content-Based Image Retrieval (CBIR), the giant increase in the number and variety of images available in digital format is challenging the research community. It is quite easy to see that any approach aiming at facing such challenges must rely on different image representations that need to be conveniently fused in order to adapt to the subjectivity of image semantics. This paper offers a journey through the main information fusion ingredients that a recipe for the design of a CBIR system should include to meet the demanding needs of users
Deep Image Retrieval: A Survey
In recent years a vast amount of visual content has been generated and shared
from various fields, such as social media platforms, medical images, and
robotics. This abundance of content creation and sharing has introduced new
challenges. In particular, searching databases for similar content, i.e.content
based image retrieval (CBIR), is a long-established research area, and more
efficient and accurate methods are needed for real time retrieval. Artificial
intelligence has made progress in CBIR and has significantly facilitated the
process of intelligent search. In this survey we organize and review recent
CBIR works that are developed based on deep learning algorithms and techniques,
including insights and techniques from recent papers. We identify and present
the commonly-used benchmarks and evaluation methods used in the field. We
collect common challenges and propose promising future directions. More
specifically, we focus on image retrieval with deep learning and organize the
state of the art methods according to the types of deep network structure, deep
features, feature enhancement methods, and network fine-tuning strategies. Our
survey considers a wide variety of recent methods, aiming to promote a global
view of the field of instance-based CBIR.Comment: 20 pages, 11 figure
Deep image representations for instance search
We address the problem of visual instance search, which consists to retrieve all
the images within an dataset that contain a particular visual example provided to
the system. The traditional approach of processing the image content for this task
relied on extracting local low-level information within images that was “manually
engineered” to be invariant to di↵erent image conditions. One of the most popular
approaches uses the Bag of Visual Words (BoW) model on the local features to
aggregate the local information into a single representation. Usually, a final reranking stage is included in the pipeline to refine the search results. Since the
emergence of deep learning as the dominant technique in computer vision in 2012,
much research attention has been focused on deriving image representations from
Convolutional Neural Networks (CNN) models for the task of instance search as a
“data driven” approach to designing image representations. However, one of the main
challenges in the instance search task is the lack of annotated datasets to fit CNN
models parameters.
This work explores the capabilities of descriptors derived from pre-trained CNN
models for image classification to address the task of instance retrieval. First, we
conduct an investigation of the traditional bag of visual words encoding on local
CNN features to produce a scalable image retrieval framework that generalizes well
across di↵erent retrieval domains. Second, we propose to improve the capacity of the
obtained representations by exploring an unsupervised fine-tuning strategy that allow
us to obtain better performing representations at the price of losing the generalization
of the representations. Finally, we propose using visual attention models to weight
the contribution of the relevant parts of an image to obtain a very powerful image
representation for instance retrieval without requiring the construction of a large
and suitable training dataset for fine-tuning CNN architectures
Image sense disambiguation : a multimodal approach
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 131-136).If a picture is worth a thousand words, can a thousand words be worth a training image? Most successful object recognition algorithms require manually annotated images of objects to be collected for training. The amount of human effort required to collect training data has limited most approaches to the several hundred object categories available in the labeled datasets. While human-annotated image data is scarce, additional sources of information can be used as weak labels, reducing the need for human supervision. In this thesis, we use three types of information to learn models of object categories: speech, text and dictionaries. We demonstrate that our use of non-traditional information sources facilitates automatic acquisition of visual object models for arbitrary words without requiring any labeled image examples. Spoken object references occur in many scenarios: interaction with an assistant robot, voice-tagging of photos, etc. Existing reference resolution methods are unimodal, relying either only on image features, or only on speech recognition. We propose a method that uses both the image of the object and the speech segment referring to it to disambiguate the underlying object label. We show that even noisy speech input helps visual recognition, and vice versa. We also explore two sources of linguistic sense information: the words surrounding images on web pages, and dictionary entries for nouns that refer to objects. Keywords that index images on the web have been used as weak object labels, but these tend to produce noisy datasets with many unrelated images. We use unlabeled text, dictionary definitions, and semantic relations between concepts to learn a refined model of image sense. Our model can work with as little supervision as a single English word. We apply this model to a dataset of web images indexed by polysemous keywords, and show that it improves both retrieval of specific senses, and the resulting object classifiers.by Kate Saenko.Ph.D
Recommended from our members
Natural scene classification, annotation and retrieval. Developing different approaches for semantic scene modelling based on Bag of Visual Words.
With the availability of inexpensive hardware and software, digital imaging has become an important medium of communication in our daily lives. A huge amount of digital images are being collected and become available through the internet and stored in various fields such as personal image collections, medical imaging, digital arts etc. Therefore, it is important to make sure that images are stored, searched and accessed in an efficient manner. The use of bag of visual words (BOW) model for modelling images based on local invariant features computed at interest point locations has become a standard choice for many computer vision tasks. Based on this promising model, this thesis investigates three main problems: natural scene classification, annotation and retrieval. Given an image, the task is to design a system that can determine to which class that image belongs to (classification), what semantic concepts it contain (annotation) and what images are most similar to (retrieval).
This thesis contributes to scene classification by proposing a weighting approach, named keypoints density-based weighting method (KDW), to control the fusion of colour information and bag of visual words on spatial pyramid layout in a unified framework. Different configurations of BOW, integrated visual vocabularies and multiple image descriptors are investigated and analyzed. The proposed approaches are extensively evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories using 10-fold cross validation. The second contribution in this thesis, the scene annotation task, is to explore whether the integrated visual vocabularies generated for scene classification can be used to model the local semantic information of natural scenes. In this direction, image annotation is considered as a classification problem where images are partitioned into 10x10 fixed grid and each block, represented by BOW and different image descriptors, is classified into one of predefined semantic classes. An image is then represented by counting the percentage of every semantic concept detected in the image. Experimental results on 6 scene categories demonstrate the effectiveness of the proposed approach. Finally, this thesis further explores, with an extensive experimental work, the use of different configurations of the BOW for natural scene retrieval.Applied Science University in Jorda
A system for large-scale image and video retrieval on everyday scenes
There has been a growing amount of multimedia data generated on the web todayin terms of size and diversity. This has made accurate content retrieval with these large and complex collections of data a challenging problem. Motivated by the need for systems that can enable scalable and efficient search, we propose QIK (Querying Images Using Contextual Knowledge). QIK leverages advances in deep learning (DL) and natural language processing (NLP) for scene understanding to enable large-scale multimedia retrieval on everyday scenes with common objects. The system consists of three major components: Indexer, Query Processor, and Video Processor. Given an image, the Indexer performs probabilistic image understanding (PIU). The PIU generated consists of the most probable captions, parsed and represented by tree structures using NLP techniques, and detected objects. The PIU's are stored and indexed in a database system. For a query image, the Query Processor generates the most probable caption and parses it into the corresponding tree structure. Then an optimized tree-pattern query is constructed and executed on the database to retrieve a set of candidate images. The candidate images fetched are ranked using the tree-edit distance metric computed on the tree structures. Given a video, the Video Processor extracts a sequence of key scenes that are posed to the Query Processor to retrieve a set of candidate scenes. The candidate scene parse trees corresponding to a video are extracted and are ranked based on the number of matching scenes. We evaluated the performance of our system for large-scale image and video retrieval tasks on datasets containing everyday scenes and observed that our system could outperform state-ofthe- art techniques in terms of mean average precision.Includes bibliographical references
- …