477 research outputs found
Deep Autoencoders for Cross-Modal Retrieval
Increased accuracy and affordability of depth sensors such as Kinect has created a great depth-data source for 3D processing. Specifically, 3D model retrieval is attracting attention in the field of computer vision and pattern recognition due to its numerous applications. A cross-domain retrieval approach such as depth image based 3D model retrieval has the challenges of occlusion, noise, and view variability present in both query and training data. In this research, we propose a new supervised deep autoencoder approach followed by semantic modeling to retrieve 3D shapes based on depth images. The key novelty is the two-fold feature abstraction to cope with the incompleteness and ambiguity present in the depth images. First, we develop a supervised autoencoder to extract robust features from both real depth images and synthetic ones rendered from 3D models, which are intended to balance reconstruction and classification capabilities of mix-domain data. We investigate the relation between encoder and decoder layers in a deep autoencoder and claim that an asymmetric structure of a supervised deep autoencoder is more capable of extracting robust features than that of a symmetric one. The asymmetric deep autoencoder features are less invariant to small sample changes in mixed domain data. In addition, semantic modeling of the supervised autoencoder features offers the next level of abstraction to the incompleteness and ambiguity of the depth data. It is interesting that, unlike any other pairwise model structures, the cross-domain retrieval is still possible using only one single deep network trained on real and synthetic data. The experimental results on the NYUD2 and ModelNet10 datasets demonstrate that the proposed supervised method outperforms the recent approaches for cross modal 3D model retrieval
Semi-supervised 3D Video Information Retrieval with Deep Neural Network and Bi-directional Dynamic-time Warping Algorithm
This paper presents a novel semi-supervised deep learning algorithm for
retrieving similar 2D and 3D videos based on visual content. The proposed
approach combines the power of deep convolutional and recurrent neural networks
with dynamic time warping as a similarity measure. The proposed algorithm is
designed to handle large video datasets and retrieve the most related videos to
a given inquiry video clip based on its graphical frames and contents. We split
both the candidate and the inquiry videos into a sequence of clips and convert
each clip to a representation vector using an autoencoder-backed deep neural
network. We then calculate a similarity measure between the sequences of
embedding vectors using a bi-directional dynamic time-warping method. This
approach is tested on multiple public datasets, including CC\_WEB\_VIDEO,
Youtube-8m, S3DIS, and Synthia, and showed good results compared to
state-of-the-art. The algorithm effectively solves video retrieval tasks and
outperforms the benchmarked state-of-the-art deep learning model.Comment: 10 pages, submitted to IEEE Conference Big Data 202
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
Medical image retrieval for augmenting diagnostic radiology
Even though the use of medical imaging to diagnose patients is ubiquitous in clinical settings, their interpretations are still challenging for radiologists. Many factors make this interpretation task difficult, one of which is that medical images sometimes present subtle clues yet are crucial for diagnosis. Even worse, on the other hand, similar clues could indicate multiple diseases, making it challenging to figure out the definitive diagnoses. To help radiologists quickly and accurately interpret medical images, there is a need for a tool that can augment their diagnostic procedures and increase efficiency in their daily workflow. A general-purpose medical image retrieval system can be such a
tool as it allows them to search and retrieve similar cases that are already diagnosed to make comparative analyses that would complement their diagnostic decisions. In this thesis, we contribute to developing such a system by proposing approaches to be integrated as modules of a single system, enabling it to handle various information needs of radiologists and thus augment their diagnostic processes during the interpretation of medical images.
We have mainly studied the following retrieval approaches to handle radiologists’different information needs; i) Retrieval Based on Contents, ii) Retrieval Based on Contents, Patients’ Demographics, and Disease Predictions, and iii) Retrieval Based on Contents and Radiologists’ Text Descriptions. For the first study, we aimed to find an effective feature representation method to distinguish medical images considering their semantics and modalities. To do that, we have experimented different representation techniques based on handcrafted methods (mainly texture features) and deep learning (deep features). Based on the experimental results, we propose an effective feature representation approach and deep learning architectures for learning and extracting medical image contents. For the second study, we present a multi-faceted method that complements image contents with patients’ demographics and deep learning-based disease predictions, making it able to identify similar cases accurately considering the clinical context the radiologists seek.
For the last study, we propose a guided search method that integrates an image with a radiologist’s text description to guide the retrieval process. This method guarantees that the retrieved images are suitable for the comparative analysis to confirm or rule
out initial diagnoses (the differential diagnosis procedure). Furthermore, our method is based on a deep metric learning technique and is better than traditional content-based approaches that rely on only image features and, thus, sometimes retrieve insignificant random images
Autoencoder-based Image Recommendation for Lung Cancer Characterization
Neste projeto, temos como objetivo desenvolver um sistema de IA que recomende um conjunto de casos relativos (passados) para orientar a tomada de decisão do médico.
Objetivo: A ambição Ă© desenvolver um modelo de aprendizado baseado em IA para caracterização de câncer de pulmĂŁo, a fim de auxiliar na rotina clĂnica. Considerando a complexidade dos fenĂ´menos biolĂłgicos que ocorrem durante o desenvolvimento do câncer, as relações entre eles e as manifestações visuais capturadas pela tomografia computadorizada (CT) tĂŞm sido exploradas nos Ăşltimos anos. No entanto, devido Ă falta de robustez dos mĂ©todos atuais de aprendizado profundo, essas correlações sĂŁo frequentemente consideradas espĂşrias e se perdem quando confrontadas com dados coletados a partir de distribuições alteradas: diferentes instituições, caracterĂsticas demográficas ou atĂ© mesmo estágios de desenvolvimento do câncer.In this project, we aim to develop an AI system that recommends a set of relative (past) cases to guide the decision-making of the clinician.
Objective: The ambition is to develop an AI-based learning model for lung cancer characterization in order to assist in clinical routine. Considering the complexity of the biological phenomenat hat occur during cancer development, relationships between these and visual manifestations captured by CT have been explored in recent years; however, given the lack of robustness of current deep learning methods, these correlations are often found spurious and get lost when facing data collected from shifted distributions: different institutions, demographics or even stages of cancer development
- …