5,324 research outputs found
A Discriminative Representation of Convolutional Features for Indoor Scene Recognition
Indoor scene recognition is a multi-faceted and challenging problem due to
the diverse intra-class variations and the confusing inter-class similarities.
This paper presents a novel approach which exploits rich mid-level
convolutional features to categorize indoor scenes. Traditionally used
convolutional features preserve the global spatial structure, which is a
desirable property for general object recognition. However, we argue that this
structuredness is not much helpful when we have large variations in scene
layouts, e.g., in indoor scenes. We propose to transform the structured
convolutional activations to another highly discriminative feature space. The
representation in the transformed space not only incorporates the
discriminative aspects of the target dataset, but it also encodes the features
in terms of the general object categories that are present in indoor scenes. To
this end, we introduce a new large-scale dataset of 1300 object categories
which are commonly present in indoor scenes. Our proposed approach achieves a
significant performance boost over previous state of the art approaches on five
major scene classification datasets
Subgraph Networks Based Contrastive Learning
Graph contrastive learning (GCL), as a self-supervised learning method, can
solve the problem of annotated data scarcity. It mines explicit features in
unannotated graphs to generate favorable graph representations for downstream
tasks. Most existing GCL methods focus on the design of graph augmentation
strategies and mutual information estimation operations. Graph augmentation
produces augmented views by graph perturbations. These views preserve a locally
similar structure and exploit explicit features. However, these methods have
not considered the interaction existing in subgraphs. To explore the impact of
substructure interactions on graph representations, we propose a novel
framework called subgraph network-based contrastive learning (SGNCL). SGNCL
applies a subgraph network generation strategy to produce augmented views. This
strategy converts the original graph into an Edge-to-Node mapping network with
both topological and attribute features. The single-shot augmented view is a
first-order subgraph network that mines the interaction between nodes,
node-edge, and edges. In addition, we also investigate the impact of the
second-order subgraph augmentation on mining graph structure interactions, and
further, propose a contrastive objective that fuses the first-order and
second-order subgraph information. We compare SGNCL with classical and
state-of-the-art graph contrastive learning methods on multiple benchmark
datasets of different domains. Extensive experiments show that SGNCL achieves
competitive or better performance (top three) on all datasets in unsupervised
learning settings. Furthermore, SGNCL achieves the best average gain of 6.9\%
in transfer learning compared to the best method. Finally, experiments also
demonstrate that mining substructure interactions have positive implications
for graph contrastive learning.Comment: 12 pages, 6 figure
Morpho-MNIST: Quantitative Assessment and Diagnostics for Representation Learning
Revealing latent structure in data is an active field of research, having brought exciting new models such as variational autoencoders and generative adversarial networks, and is essential to push machine learning towards unsupervised knowledge discovery. However, a major challenge is the lack of suitable benchmarks for an objective and quantitative evaluation of learned representations. To address this issue we introduce Morpho-MNIST. We extend the popular MNIST dataset by adding a morphometric analysis enabling quantitative comparison of different models, identification of the roles of latent variables, and characterisation of sample diversity. We further propose a set of quantifiable perturbations to assess the performance of unsupervised and supervised methods on challenging tasks such as outlier detection and domain adaptation
Domain Generalization -- A Causal Perspective
Machine learning models rely on various assumptions to attain high accuracy.
One of the preliminary assumptions of these models is the independent and
identical distribution, which suggests that the train and test data are sampled
from the same distribution. However, this assumption seldom holds in the real
world due to distribution shifts. As a result models that rely on this
assumption exhibit poor generalization capabilities. Over the recent years,
dedicated efforts have been made to improve the generalization capabilities of
these models collectively known as -- \textit{domain generalization methods}.
The primary idea behind these methods is to identify stable features or
mechanisms that remain invariant across the different distributions. Many
generalization approaches employ causal theories to describe invariance since
causality and invariance are inextricably intertwined. However, current surveys
deal with the causality-aware domain generalization methods on a very
high-level. Furthermore, we argue that it is possible to categorize the methods
based on how causality is leveraged in that method and in which part of the
model pipeline is it used. To this end, we categorize the causal domain
generalization methods into three categories, namely, (i) Invariance via Causal
Data Augmentation methods which are applied during the data pre-processing
stage, (ii) Invariance via Causal representation learning methods that are
utilized during the representation learning stage, and (iii) Invariance via
Transferring Causal mechanisms methods that are applied during the
classification stage of the pipeline. Furthermore, this survey includes
in-depth insights into benchmark datasets and code repositories for domain
generalization methods. We conclude the survey with insights and discussions on
future directions
Multi modal multi-semantic image retrieval
PhDThe rapid growth in the volume of visual information, e.g. image, and video can
overwhelm users’ ability to find and access the specific visual information of interest
to them. In recent years, ontology knowledge-based (KB) image information retrieval
techniques have been adopted into in order to attempt to extract knowledge from these
images, enhancing the retrieval performance. A KB framework is presented to
promote semi-automatic annotation and semantic image retrieval using multimodal
cues (visual features and text captions). In addition, a hierarchical structure for the KB
allows metadata to be shared that supports multi-semantics (polysemy) for concepts.
The framework builds up an effective knowledge base pertaining to a domain specific
image collection, e.g. sports, and is able to disambiguate and assign high level
semantics to ‘unannotated’ images.
Local feature analysis of visual content, namely using Scale Invariant Feature
Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’
model (BVW) as an effective method to represent visual content information and to
enhance its classification and retrieval. Local features are more useful than global
features, e.g. colour, shape or texture, as they are invariant to image scale, orientation
and camera angle. An innovative approach is proposed for the representation,
annotation and retrieval of visual content using a hybrid technique based upon the use
of an unstructured visual word and upon a (structured) hierarchical ontology KB
model. The structural model facilitates the disambiguation of unstructured visual
words and a more effective classification of visual content, compared to a vector
space model, through exploiting local conceptual structures and their relationships.
The key contributions of this framework in using local features for image
representation include: first, a method to generate visual words using the semantic
local adaptive clustering (SLAC) algorithm which takes term weight and spatial
locations of keypoints into account. Consequently, the semantic information is
preserved. Second a technique is used to detect the domain specific ‘non-informative
visual words’ which are ineffective at representing the content of visual data and
degrade its categorisation ability. Third, a method to combine an ontology model with
xi
a visual word model to resolve synonym (visual heterogeneity) and polysemy
problems, is proposed. The experimental results show that this approach can discover
semantically meaningful visual content descriptions and recognise specific events,
e.g., sports events, depicted in images efficiently.
Since discovering the semantics of an image is an extremely challenging problem, one
promising approach to enhance visual content interpretation is to use any associated
textual information that accompanies an image, as a cue to predict the meaning of an
image, by transforming this textual information into a structured annotation for an
image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct
types of information representation and modality, there are some strong, invariant,
implicit, connections between images and any accompanying text information.
Semantic analysis of image captions can be used by image retrieval systems to
retrieve selected images more precisely. To do this, a Natural Language Processing
(NLP) is exploited firstly in order to extract concepts from image captions. Next, an
ontology-based knowledge model is deployed in order to resolve natural language
ambiguities. To deal with the accompanying text information, two methods to extract
knowledge from textual information have been proposed. First, metadata can be
extracted automatically from text captions and restructured with respect to a semantic
model. Second, the use of LSI in relation to a domain-specific ontology-based
knowledge model enables the combined framework to tolerate ambiguities and
variations (incompleteness) of metadata. The use of the ontology-based knowledge
model allows the system to find indirectly relevant concepts in image captions and
thus leverage these to represent the semantics of images at a higher level.
Experimental results show that the proposed framework significantly enhances image
retrieval and leads to narrowing of the semantic gap between lower level machinederived
and higher level human-understandable conceptualisation
Contrastive Learning and the Emergence of Attributes Associations
In response to an object presentation, supervised learning schemes generally
respond with a parsimonious label. Upon a similar presentation we humans
respond again with a label, but are flooded, in addition, by a myriad of
associations. A significant portion of these consist of the presented object
attributes. Contrastive learning is a semi-supervised learning scheme based on
the application of identity preserving transformations on the object input
representations. It is conjectured in this work that these same applied
transformations preserve, in addition to the identity of the presented object,
also the identity of its semantically meaningful attributes. The corollary of
this is that the output representations of such a contrastive learning scheme
contain valuable information not only for the classification of the presented
object, but also for the presence or absence decision of any attribute of
interest. Simulation results which demonstrate this idea and the feasibility of
this conjecture are presented.Comment: 10 page
Recommended from our members
Geographic Knowledge Graph Summarization
Geographic knowledge graphs play a significant role in the geospatial semantics paradigm for fulfilling the interoperability, the accessibility, and the conceptualization demands in geographic information science. However, due to the immense quantity of information accompanying and the enormous diversity of geographic knowledge graphs, there are many challenges that hinder the applicability and mass adoption of such useful structured knowledge. In order to tackle these challenges, this dissertation focuses on devising ways in which geographic knowledge graphs can be digested and summarized. Such a summarization task, on the one hand lifts the burden of information overload for end users, on the other hand facilitates the reduction of data storage, speeds up queries, and helps eliminate noise. The main contribution of this dissertation is that it introduces the general concept of geospatial inductive bias and explains different ways this idea can be used in the geographic knowledge graph summarization task. By decomposing the task into separate but related components, this dissertation is based upon three peer-reviewed articles which focus on the hierarchical place type structure, multimedia leaf nodes, and general relation and entity components respectively. A spatial knowledge map interface that illustrates the effectiveness of summarizing geographic knowledge graphs is presented. Throughout the dissertation, top-down knowledge engineering and bottom-up knowledge learning methods are integrated. We hope this dissertation would promote the awareness of this fascinating area and motivate researchers to investigate related questions
- …