993 research outputs found
Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces
This paper proposes a new technique for auto-annotation and semantic retrieval based upon the idea of linearly mapping an image feature space to a keyword space. The new technique is compared to several related techniques, and a number of salient points about each of the techniques are discussed and contrasted. The paper also discusses how these techniques might actually scale to a real-world retrieval problem, and demonstrates this though a case study of a semantic retrieval technique being used on a real-world data-set (with a mix of annotated and unannotated images) from a picture library
Crowd counting using group tracking and local features
In public venues, crowd size is a key indicator of crowd safety and stability. In this paper we propose a crowd counting algorithm that uses tracking and local features to count the number of people in each group as represented by a foreground blob segment, so that the total crowd estimate is the sum of the group sizes. Tracking is employed to improve the robustness of the estimate, by analysing the history of each group, including splitting and merging events. A simplified ground truth annotation strategy results in an approach with minimal setup requirements that is highly accurate
A Novel Semantic Statistical Model for Automatic Image Annotation Using the Relationship between the Regions Based on Multi-Criteria Decision Making
Automatic image annotation has emerged as an important research topic due to the existence of the semantic gap and in addition to its potential application on image retrieval and management. In this paper we present an approach which combines regional contexts and visual topics to automatic image annotation. Regional contexts model the relationship between the regions, whereas visual topics provide the global distribution of topics over an image. Conventional image annotation methods neglected the relationship between the regions in an image, while these regions are exactly explanation of the image semantics, therefore considering the relationship between them are helpful to annotate the images. The proposed model extracts regional contexts and visual topics from the image, and incorporates them by MCDM (Multi Criteria Decision Making) approach based on TOPSIS (Technique for Order Preference by Similarity to the Ideal Solution) method. Regional contexts and visual topics are learned by PLSA (Probability Latent Semantic Analysis) from the training data. The experiments on 5k Corel images show that integrating these two kinds of information is beneficial to image annotation.DOI:http://dx.doi.org/10.11591/ijece.v4i1.459
Interactive Video Annotation Tool
Proceedings of: Forth International Workshop on User-Centric Technologies and applications (CONTEXTS 2010). Valencia, 7-10 September , 2010.Abstract: Increasingly computer vision discipline needs annotated video databases to realize assessment tasks. Manually providing ground truth data to multimedia resources is a very expensive work in terms of effort, time and economic resources. Automatic and semi-automatic video annotation and labeling is the faster and more economic way to get ground truth for quite large video collections. In this paper, we describe a new automatic and supervised video annotation tool. Annotation tool is a modified version of ViPER-GT tool. ViPER-GT standard version allows manually editing and reviewing video metadata to generate assessment data. Automatic annotation capability is possible thanks to an incorporated tracking system which can deal the visual data association problem in real time. The research aim is offer a system which enables spends less time doing valid assessment models.Publicad
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
Image annotation and retrieval based on multi-modal feature clustering and similarity propagation.
The performance of content-based image retrieval systems has proved to be inherently constrained by the used low level features, and cannot give satisfactory results when the user\u27s high level concepts cannot be expressed by low level features. In an attempt to bridge this semantic gap, recent approaches started integrating both low level-visual features and high-level textual keywords. Unfortunately, manual image annotation is a tedious process and may not be possible for large image databases. In this thesis we propose a system for image retrieval that has three mains components. The first component of our system consists of a novel possibilistic clustering and feature weighting algorithm based on robust modeling of the Generalized Dirichlet (GD) finite mixture. Robust estimation of the mixture model parameters is achieved by incorporating two complementary types of membership degrees. The first one is a posterior probability that indicates the degree to which a point fits the estimated distribution. The second membership represents the degree of typicality and is used to indentify and discard noise points. Robustness to noisy and irrelevant features is achieved by transforming the data to make the features independent and follow Beta distribution, and learning optimal relevance weight for each feature subset within each cluster. We extend our algorithm to find the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. We also outline a semi-supervised version of the proposed algorithm. In the second component of our system consists of a novel approach to unsupervised image annotation. Our approach is based on: (i) the proposed semi-supervised possibilistic clustering; (ii) a greedy selection and joining algorithm (GSJ); (iii) Bayes rule; and (iv) a probabilistic model that is based on possibilistic memebership degrees to annotate an image. The third component of the proposed system consists of an image retrieval framework based on multi-modal similarity propagation. The proposed framework is designed to deal with two data modalities: low-level visual features and high-level textual keywords generated by our proposed image annotation algorithm. The multi-modal similarity propagation system exploits the mutual reinforcement of relational data and results in a nonlinear combination of the different modalities. Specifically, it is used to learn the semantic similarities between images by leveraging the relationships between features from the different modalities. The proposed image annotation and retrieval approaches are implemented and tested with a standard benchmark dataset. We show the effectiveness of our clustering algorithm to handle high dimensional and noisy data. We compare our proposed image annotation approach to three state-of-the-art methods and demonstrate the effectiveness of the proposed image retrieval system
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
A novel real-time computational framework for detecting catheters and rigid guidewires in cardiac catheterization procedures
Purpose: Catheters and guidewires are used extensively in cardiac catheterization procedures such as heart arrhythmia treatment (ablation), angioplasty and congenital heart disease treatment. Detecting their positions in fluoroscopic X-ray images is important for several clinical applications, for example, motion compensation, co-registration between 2D and 3D imaging modalities and 3D object reconstruction. Methods: For the generalized framework, a multiscale vessel enhancement filter is first used to enhance the visibility of wire-like structures in the X-ray images. After applying adaptive binarization method, the centerlines of wire-like objects were extracted. Finally, the catheters and guidewires were detected as a smooth path which is reconstructed from centerlines of target wire-like objects. In order to classify electrode catheters which are mainly used in electrophysiology procedures, additional steps were proposed. First, a blob detection method, which is embedded in vessel enhancement filter with no additional computational cost, localizes electrode positions on catheters. Then the type of electrode catheters can be recognized by detecting the number of electrodes and also the shape created by a series of electrodes. Furthermore, for detecting guiding catheters or guidewires, a localized machine learning algorithm is added into the framework to distinguish between target wire objects and other wire-like artifacts. The proposed framework were tested on total 10,624 images which are from 102 image sequences acquired from 63 clinical cases. Results: Detection errors for the coronary sinus (CS) catheter, lasso catheter ring and lasso catheter body are 0.56 ± 0.28 mm, 0.64 ± 0.36 mm and 0.66 ± 0.32 mm, respectively, as well as success rates of 91.4%, 86.3% and 84.8% were achieved. Detection errors for guidewires and guiding catheters are 0.62 ± 0.48 mm and success rates are 83.5%. Conclusion: The proposed computational framework do not require any user interaction or prior models and it can detect multiple catheters or guidewires simultaneously and in real-time. The accuracy of the proposed framework is sub-mm and the methods are robust toward low-dose X-ray fluoroscopic images, which are mainly used during procedures to maintain low radiation dose
Stacked Denoising Autoencoders and Transfer Learning for Immunogold Particles Detection and Recognition
In this paper we present a system for the detection of immunogold particles
and a Transfer Learning (TL) framework for the recognition of these immunogold
particles. Immunogold particles are part of a high-magnification method for the
selective localization of biological molecules at the subcellular level only
visible through Electron Microscopy. The number of immunogold particles in the
cell walls allows the assessment of the differences in their compositions
providing a tool to analise the quality of different plants. For its
quantization one requires a laborious manual labeling (or annotation) of images
containing hundreds of particles. The system that is proposed in this paper can
leverage significantly the burden of this manual task.
For particle detection we use a LoG filter coupled with a SDA. In order to
improve the recognition, we also study the applicability of TL settings for
immunogold recognition. TL reuses the learning model of a source problem on
other datasets (target problems) containing particles of different sizes. The
proposed system was developed to solve a particular problem on maize cells,
namely to determine the composition of cell wall ingrowths in endosperm
transfer cells. This novel dataset as well as the code for reproducing our
experiments is made publicly available.
We determined that the LoG detector alone attained more than 84\% of accuracy
with the F-measure. Developing immunogold recognition with TL also provided
superior performance when compared with the baseline models augmenting the
accuracy rates by 10\%
- âŠ