137 research outputs found
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
Multi modal multi-semantic image retrieval
PhDThe rapid growth in the volume of visual information, e.g. image, and video can
overwhelm users’ ability to find and access the specific visual information of interest
to them. In recent years, ontology knowledge-based (KB) image information retrieval
techniques have been adopted into in order to attempt to extract knowledge from these
images, enhancing the retrieval performance. A KB framework is presented to
promote semi-automatic annotation and semantic image retrieval using multimodal
cues (visual features and text captions). In addition, a hierarchical structure for the KB
allows metadata to be shared that supports multi-semantics (polysemy) for concepts.
The framework builds up an effective knowledge base pertaining to a domain specific
image collection, e.g. sports, and is able to disambiguate and assign high level
semantics to ‘unannotated’ images.
Local feature analysis of visual content, namely using Scale Invariant Feature
Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’
model (BVW) as an effective method to represent visual content information and to
enhance its classification and retrieval. Local features are more useful than global
features, e.g. colour, shape or texture, as they are invariant to image scale, orientation
and camera angle. An innovative approach is proposed for the representation,
annotation and retrieval of visual content using a hybrid technique based upon the use
of an unstructured visual word and upon a (structured) hierarchical ontology KB
model. The structural model facilitates the disambiguation of unstructured visual
words and a more effective classification of visual content, compared to a vector
space model, through exploiting local conceptual structures and their relationships.
The key contributions of this framework in using local features for image
representation include: first, a method to generate visual words using the semantic
local adaptive clustering (SLAC) algorithm which takes term weight and spatial
locations of keypoints into account. Consequently, the semantic information is
preserved. Second a technique is used to detect the domain specific ‘non-informative
visual words’ which are ineffective at representing the content of visual data and
degrade its categorisation ability. Third, a method to combine an ontology model with
xi
a visual word model to resolve synonym (visual heterogeneity) and polysemy
problems, is proposed. The experimental results show that this approach can discover
semantically meaningful visual content descriptions and recognise specific events,
e.g., sports events, depicted in images efficiently.
Since discovering the semantics of an image is an extremely challenging problem, one
promising approach to enhance visual content interpretation is to use any associated
textual information that accompanies an image, as a cue to predict the meaning of an
image, by transforming this textual information into a structured annotation for an
image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct
types of information representation and modality, there are some strong, invariant,
implicit, connections between images and any accompanying text information.
Semantic analysis of image captions can be used by image retrieval systems to
retrieve selected images more precisely. To do this, a Natural Language Processing
(NLP) is exploited firstly in order to extract concepts from image captions. Next, an
ontology-based knowledge model is deployed in order to resolve natural language
ambiguities. To deal with the accompanying text information, two methods to extract
knowledge from textual information have been proposed. First, metadata can be
extracted automatically from text captions and restructured with respect to a semantic
model. Second, the use of LSI in relation to a domain-specific ontology-based
knowledge model enables the combined framework to tolerate ambiguities and
variations (incompleteness) of metadata. The use of the ontology-based knowledge
model allows the system to find indirectly relevant concepts in image captions and
thus leverage these to represent the semantics of images at a higher level.
Experimental results show that the proposed framework significantly enhances image
retrieval and leads to narrowing of the semantic gap between lower level machinederived
and higher level human-understandable conceptualisation
A Multi-Modal Incompleteness Ontology model (MMIO) to enhance 4 information fusion for image retrieval
This research has been supported in part by National Science and Technology Development (NSTDA), Thailand. Project No: SCH-NR2011-851
情報検索における意味的ギャップの解消 : トピックモデルを用いた先進的画像探索
Tohoku University徳山豪課
Biomedical time series analysis based on bag-of-words model
This research proposes a number of new methods for biomedical time series classification and clustering based on a novel Bag-of-Words (BoW) representation. It is anticipated that the objective and automatic biomedical time series clustering and classification technologies developed in this work will potentially benefit a wide range of applications, such as biomedical data management, archiving, retrieving, and disease diagnosis and prognosis in the future
A Review on Web Page Classification
With the increase in digital documents on the world wide web and an increase in the number of webpages and blogs which are common sources for providing users with news about current events, aggregating and categorizing information from these sources seems to be a daunting task as the volume of digital documents available online is growing exponentially. Although several benefits can accrue from the accurate classification of such documents into their respective categories such as providing tools that help people to find, filter and analyze digital information on the web amongst others. Accurate classification of these documents into their respective categories is dependent on the quality of training dataset which is dependent on the preprocessing techniques. Existing literature in this area of web page classification identified that better document representation techniques would reduce the training and testing time, improve the classification accuracy, precision and recall of classifier. In this paper, we give an overview of web page classification with an in-depth study of the web classification process, while at the same time making awareness of the need for an adequate document representation technique as this helps capture the semantics of document and-also contribute to reduce the problem of high dimensionality
Beyond Visual Words: Exploring Higher - Level Image Representation For Object Categorization
Ph.DDOCTOR OF PHILOSOPH
Utilizing multiple instance learning for computer vision tasks
Ankara : The Depertmant of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2013.Thesis (Master's) -- Bilkent University, 2013.Includes bibliographical references leaves 81-89.The Multiple Instance Learning (MIL) paradigm arises to be useful in many application
domains, whereas it is particularly suitable for computer vision problems
due to the difficulty of obtaining manual labeling. Multiple Instance Learning
methods have large applicability to a variety of challenging learning problems
in computer vision, including object recognition and detection, tracking, image
classification, scene classification and more.
As opposed to working with single instances as in standard supervised learning,
Multiple Instance Learning operates over bags of instances. A bag is labeled
as positive if it is known to contain at least one positive instance; otherwise it
is labeled as negative. The overall learning task is to learn a model for some
concept using a training set that is formed of bags. A vital component of using
Multiple Instance Learning in computer vision is its design for abstracting the
visual problem to multi-instance representation, which involves determining what
the bag is and what are the instances in the bag.
In this context, we consider three different computer vision problems and
propose solutions for each of them via novel representations. The first problem
is image retrieval and re-ranking; we propose a method that automatically
constructs multiple candidate Multi-instance bags, which are likely to contain
relevant images. The second problem we look into is recognizing actions from
still images, where we extract several candidate object regions and approach the
problem of identifying related objects from a weakly supervised point of view.
Finally, we address the recognition of human interactions in videos within a MIL
framework. In human interaction recognition, videos may be composed of frames of different activities, and the task is to identify the interaction in spite of irrelevant
activities that are scattered through the video. To overcome this problem,
we use the idea of Multiple Instance Learning to tackle irrelevant actions in the
whole video sequence classification. Each of the outlined problems are tested
on benchmark datasets of the problems and compared with the state-of-the-art.
The experimental results verify the advantages of the proposed MIL approaches
to these vision problems.Şener, FadimeM.S
- …