54 research outputs found
Information-Theoretic Active Learning for Content-Based Image Retrieval
We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode
active learning method for binary classification, and apply it for acquiring
meaningful user feedback in the context of content-based image retrieval.
Instead of combining different heuristics such as uncertainty, diversity, or
density, our method is based on maximizing the mutual information between the
predicted relevance of the images and the expected user feedback regarding the
selected batch. We propose suitable approximations to this computationally
demanding problem and also integrate an explicit model of user behavior that
accounts for possible incorrect labels and unnameable instances. Furthermore,
our approach does not only take the structure of the data but also the expected
model output change caused by the user feedback into account. In contrast to
other methods, ITAL turns out to be highly flexible and provides
state-of-the-art performance across various datasets, such as MIRFLICKR and
ImageNet.Comment: GCPR 2018 paper (14 pages text + 2 pages references + 6 pages
appendix
Long-Term Visual Object Tracking Benchmark
We propose a new long video dataset (called Track Long and Prosper - TLP) and
benchmark for single object tracking. The dataset consists of 50 HD videos from
real world scenarios, encompassing a duration of over 400 minutes (676K
frames), making it more than 20 folds larger in average duration per sequence
and more than 8 folds larger in terms of total covered duration, as compared to
existing generic datasets for visual tracking. The proposed dataset paves a way
to suitably assess long term tracking performance and train better deep
learning architectures (avoiding/reducing augmentation, which may not reflect
real world behaviour). We benchmark the dataset on 17 state of the art trackers
and rank them according to tracking accuracy and run time speeds. We further
present thorough qualitative and quantitative evaluation highlighting the
importance of long term aspect of tracking. Our most interesting observations
are (a) existing short sequence benchmarks fail to bring out the inherent
differences in tracking algorithms which widen up while tracking on long
sequences and (b) the accuracy of trackers abruptly drops on challenging long
sequences, suggesting the potential need of research efforts in the direction
of long-term tracking.Comment: ACCV 2018 (Oral
Hard Occlusions in Visual Object Tracking
Visual object tracking is among the hardest problems in computer vision, as
trackers have to deal with many challenging circumstances such as illumination
changes, fast motion, occlusion, among others. A tracker is assessed to be good
or not based on its performance on the recent tracking datasets, e.g., VOT2019,
and LaSOT. We argue that while the recent datasets contain large sets of
annotated videos that to some extent provide a large bandwidth for training
data, the hard scenarios such as occlusion and in-plane rotation are still
underrepresented. For trackers to be brought closer to the real-world scenarios
and deployed in safety-critical devices, even the rarest hard scenarios must be
properly addressed. In this paper, we particularly focus on hard occlusion
cases and benchmark the performance of recent state-of-the-art trackers (SOTA)
on them. We created a small-scale dataset containing different categories
within hard occlusions, on which the selected trackers are evaluated. Results
show that hard occlusions remain a very challenging problem for SOTA trackers.
Furthermore, it is observed that tracker performance varies wildly between
different categories of hard occlusions, where a top-performing tracker on one
category performs significantly worse on a different category. The varying
nature of tracker performance based on specific categories suggests that the
common tracker rankings using averaged single performance scores are not
adequate to gauge tracker performance in real-world scenarios.Comment: Accepted at ECCV 2020 Workshop RLQ-TO
BRISC—An Open Source Pulmonary Nodule Image Retrieval Framework
We have created a content-based image retrieval framework for computed tomography images of pulmonary nodules. When presented with a nodule image, the system retrieves images of similar nodules from a collection prepared by the Lung Image Database Consortium (LIDC). The system (1) extracts images of individual nodules from the LIDC collection based on LIDC expert annotations, (2) stores the extracted data in a flat XML database, (3) calculates a set of quantitative descriptors for each nodule that provide a high-level characterization of its texture, and (4) uses various measures to determine the similarity of two nodules and perform queries on a selected query nodule. Using our framework, we compared three feature extraction methods: Haralick co-occurrence, Gabor filters, and Markov random fields. Gabor and Markov descriptors perform better at retrieving similar nodules than do Haralick co-occurrence techniques, with best retrieval precisions in excess of 88%. Because the software we have developed and the reference images are both open source and publicly available they may be incorporated into both commercial and academic imaging workstations and extended by others in their research
Learning Tversky Similarity
In this paper, we advocate Tversky's ratio model as an appropriate basis for
computational approaches to semantic similarity, that is, the comparison of
objects such as images in a semantically meaningful way. We consider the
problem of learning Tversky similarity measures from suitable training data
indicating whether two objects tend to be similar or dissimilar.
Experimentally, we evaluate our approach to similarity learning on two image
datasets, showing that is performs very well compared to existing methods
The State of the Art of Medical Imaging Technology: from Creation to Archive and Back
Medical imaging has learnt itself well into modern medicine and revolutionized medical industry in the last 30 years. Stemming from the discovery of X-ray by Nobel laureate Wilhelm Roentgen, radiology was born, leading to the creation of large quantities of digital images as opposed to film-based medium. While this rich supply of images provides immeasurable information that would otherwise not be possible to obtain, medical images pose great challenges in archiving them safe from corrupted, lost and misuse, retrievable from databases of huge sizes with varying forms of metadata, and reusable when new tools for data mining and new media for data storing become available. This paper provides a summative account on the creation of medical imaging tomography, the development of image archiving systems and the innovation from the existing acquired image data pools. The focus of this paper is on content-based image retrieval (CBIR), in particular, for 3D images, which is exemplified by our developed online e-learning system, MIRAGE, home to a repository of medical images with variety of domains and different dimensions. In terms of novelties, the facilities of CBIR for 3D images coupled with image annotation in a fully automatic fashion have been developed and implemented in the system, resonating with future versatile, flexible and sustainable medical image databases that can reap new innovations
XMIAR: X-ray medical image annotation and retrieval
The huge development of the digitized medical image has been steered
to the enlargement and research of the Content Based Image Retrieval (CBIR)
systems. Those systems retrieve and extract the images by their own low level
features, like texture, shape and color. But those visual features did not aloe the
users to request images by the semantic meanings. The image annotation or
classification systems can be considered as the solution for the limitations of the
CBIR, and to reduce the semantic gap, this has been aimed annotating or to make
the classification of the image with few controlled keywords. In this paper, we
suggest a new hierarchal classification for the X-ray medical image using the
machine learning techniques, which are called the Support Vector Machine
(SVM) and k-Nearest Neighbour (k-NN). Hierarchy classification design was
proposed based on the main body region. Evaluation was conducted based on
ImageCLEF2005 database. The obtained results in this research were improved
compared to the previous related studies
Interpretation, Evaluation and the Semantic Gap ... What if we Were on a Side-Track?
International audienceA significant amount of research in Document Image Analysis, and Machine Perception in general, relies on the extraction and analysis of signal cues with the goal of interpreting them into higher level information. This paper gives an overview on how this interpretation process is usually considered, and how the research communities proceed in evaluating existing approaches and methods developed for realizing these processes. Evaluation being an essential part to measuring the quality of research and assessing the progress of the state-of-the art, our work aims at showing that classical evaluation methods are not necessarily well suited for interpretation problems, or, at least, that they introduce a strong bias, not necessarily visible at first sight, and that new ways of comparing methods and measuring performance are necessary. It also shows that the infamous {\em Semantic Gap} seems to be an inherent and unavoidable part of the general interpretation process, especially when considered within the framework of traditional evaluation. The use of Formal Concept Analysis is put forward to leverage these limitations into a new tool to the analysis and comparison of interpretation contexts
2D recurrent neural networks for robust visual tracking of non-rigid bodies
© Springer International Publishing Switzerland 2016. The efficient tracking of articulated bodies over time is an essential element of pattern recognition and dynamic scenes analysis. This paper proposes a novel method for robust visual tracking, based on the combination of image-based prediction and weighted correlation. Starting from an initial guess, neural computation is applied to predict the position of the target in each video frame. Normalized cross-correlation is then applied to refine the predicted target position. Image-based prediction relies on a novel architecture, derived from the Elman’s Recurrent Neural Networks and adopting nearest neighborhood connections between the input and context layers in order to store the temporal information content of the video. The proposed architecture, named 2D Recurrent Neural Network, ensures both a limited complexity and a very fast learning stage. At the same time, it guarantees fast execution times and excellent accuracy for the considered tracking task. The effectiveness of the proposed approach is demonstrated on a very challenging set of dynamic image sequences, extracted from the final of triple jump at the London 2012 Summer Olympics. The system shows remarkable performance in all considered cases, characterized by changing background and a large variety of articulated motions
Saliency-weighted graphs for efficient visual content description and their applications in real-time image retrieval systems
YesThe exponential growth in the volume of digital image databases is making it increasingly difficult to retrieve relevant information from them. Efficient retrieval systems require distinctive features extracted from visually rich contents, represented semantically in a human perception-oriented manner. This paper presents an efficient framework to model image contents as an undirected attributed relational graph, exploiting color, texture, layout, and saliency information. The proposed method encodes salient features into this rich representative model without requiring any segmentation or clustering procedures, reducing the computational complexity. In addition, an efficient graph-matching procedure implemented on specialized hardware makes it more suitable for real-time retrieval applications. The proposed framework has been tested on three publicly available datasets, and the results prove its superiority in terms of both effectiveness and efficiency in comparison with other state-of-the-art schemes.Supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2013R1A1A2012904)
- …