3,344 research outputs found
Strategies for image visualisation and browsing
PhDThe exploration of large information spaces has remained a challenging task even
though the proliferation of database management systems and the state-of-the art
retrieval algorithms is becoming pervasive. Signi cant research attention in the
multimedia domain is focused on nding automatic algorithms for organising digital
image collections into meaningful structures and providing high-semantic image
indices. On the other hand, utilisation of graphical and interactive methods from
information visualisation domain, provide promising direction for creating e cient
user-oriented systems for image management. Methods such as exploratory browsing
and query, as well as intuitive visual overviews of image collection, can assist
the users in nding patterns and developing the understanding of structures and
content in complex image data-sets.
The focus of the thesis is combining the features of automatic data processing
algorithms with information visualisation. The rst part of this thesis focuses on
the layout method for displaying the collection of images indexed by low-level visual
descriptors. The proposed solution generates graphical overview of the data-set as
a combination of similarity based visualisation and random layout approach.
Second part of the thesis deals with problem of visualisation and exploration for
hierarchical organisation of images. Due to the absence of the semantic information,
images are considered the only source of high-level information. The content preview
and display of hierarchical structure are combined in order to support image
retrieval. In addition to this, novel exploration and navigation methods are proposed
to enable the user to nd the way through database structure and retrieve
the content.
On the other hand, semantic information is available in cases where automatic
or semi-automatic image classi ers are employed. The automatic annotation of
image items provides what is referred to as higher-level information. This type
of information is a cornerstone of multi-concept visualisation framework which is
developed as a third part of this thesis. This solution enables dynamic generation
of user-queries by combining semantic concepts, supported by content overview and
information ltering.
Comparative analysis and user tests, performed for the evaluation of the proposed
solutions, focus on the ways information visualisation a ects the image content
exploration and retrieval; how e cient and comfortable are the users when
using di erent interaction methods and the ways users seek for information through
di erent types of database organisation
Active Globally Explainable Learning for Medical Images via Class Association Embedding and Cyclic Adversarial Generation
Explainability poses a major challenge to artificial intelligence (AI)
techniques. Current studies on explainable AI (XAI) lack the efficiency of
extracting global knowledge about the learning task, thus suffer deficiencies
such as imprecise saliency, context-aware absence and vague meaning. In this
paper, we propose the class association embedding (CAE) approach to address
these issues. We employ an encoder-decoder architecture to embed sample
features and separate them into class-related and individual-related style
vectors simultaneously. Recombining the individual-style code of a given sample
with the class-style code of another leads to a synthetic sample with preserved
individual characters but changed class assignment, following a cyclic
adversarial learning strategy. Class association embedding distills the global
class-related features of all instances into a unified domain with well
separation between classes. The transition rules between different classes can
be then extracted and further employed to individual instances. We then propose
an active XAI framework which manipulates the class-style vector of a certain
sample along guided paths towards the counter-classes, resulting in a series of
counter-example synthetic samples with identical individual characters.
Comparing these counterfactual samples with the original ones provides a
global, intuitive illustration to the nature of the classification tasks. We
adopt the framework on medical image classification tasks, which show that more
precise saliency maps with powerful context-aware representation can be
achieved compared with existing methods. Moreover, the disease pathology can be
directly visualized via traversing the paths in the class-style space
Improving the Security of Smartwatch Payment with Deep Learning
Making contactless payments using a smartwatch is increasingly popular, but
this payment medium lacks traditional biometric security measures such as
facial or fingerprint recognition. In 2022, Sturgess et al. proposed WatchAuth,
a system for authenticating smartwatch payments using the physical gesture of
reaching towards a payment terminal. While effective, the system requires the
user to undergo a burdensome enrolment period to achieve acceptable error
levels. In this dissertation, we explore whether applications of deep learning
can reduce the number of gestures a user must provide to enrol into an
authentication system for smartwatch payment. We firstly construct a
deep-learned authentication system that outperforms the current
state-of-the-art, including in a scenario where the target user has provided a
limited number of gestures. We then develop a regularised autoencoder model for
generating synthetic user-specific gestures. We show that using these gestures
in training improves classification ability for an authentication system.
Through this technique we can reduce the number of gestures required to enrol a
user into a WatchAuth-like system without negatively impacting its error rates.Comment: Master's thesis, 74 pages. 32 figure
Deep Representation-aligned Graph Multi-view Clustering for Limited Labeled Multi-modal Health Data
Today, many fields are characterised by having extensive quantities of data from a wide range of dissimilar sources and domains. One such field is medicine, in which data contain exhaustive combinations of spatial, temporal, linear, and relational data. Often lacking expert-assessed labels, much of this data would require analysis within the fields of unsupervised or semi-supervised learning. Thus, reasoned by the notion that higher view-counts provide more ways to recognise commonality across views, contrastive multi-view clustering may be utilised to train a model to suppress redundancy and otherwise medically irrelevant information. Yet, standard multi-view clustering approaches do not account for relational graph data. Recent developments aim to solve this by utilising various graph operations including graph-based attention. And within deep-learning graph-based multi-view clustering on a sole view-invariant affinity graph, representation alignment remains unexplored.
We introduce Deep Representation-Aligned Graph Multi-View Clustering (DRAGMVC), a novel attention-based graph multi-view clustering model. Comparing maximal performance, our model surpassed the state-of-the-art in eleven out of twelve metrics on Cora, CiteSeer, and PubMed. The model considers view alignment on a sample-level by employing contrastive loss and relational data through a novel take on graph attention embeddings in which we use a Markov chain prior to increase the receptive field of each layer. For clustering, a graph-induced DDC module is used. GraphSAINT sampling is implemented to control our mini-batch space to capitalise on our Markov prior.
Additionally, we present the MIMIC pleural effusion graph multi-modal dataset, consisting of two modalities registering 3520 chest X-ray images along with two static views registered within a one-day time frame: vital signs and lab tests. These making up the, in total, three views of the dataset. We note a significant improvement in terms of separability, view mixing, and clustering performance comparing DRAGMVC to preceding non-graph multi-view clustering models, suggesting a possible, largely unexplored use case of unsupervised graph multi-view clustering on graph-induced, multi-modal, and complex medical data
Grounding semantic cognition using computational modelling and network analysis
The overarching objective of this thesis is to further the field of grounded semantics using a range of computational and empirical studies. Over the past thirty years, there have been many algorithmic advances in the
modelling of semantic cognition. A commonality across these cognitive models is a reliance on hand-engineering βtoy-modelsβ. Despite incorporating newer
techniques (e.g. Long short-term memory), the model inputs remain unchanged. We argue that the inputs to these traditional semantic models have little resemblance with real human experiences. In this dissertation, we ground our neural network models by training them with real-world visual scenes using naturalistic photographs. Our approach is an alternative to both hand-coded
features and embodied raw sensorimotor signals.
We conceptually replicate the mutually reinforcing nature of hybrid (feature-based and grounded) representations using silhouettes of concrete concepts as model inputs. We next gradually develop a novel grounded cognitive semantic representation which we call scene2vec, starting with object co-occurrences and then adding emotions and language-based tags. Limitations of our scene-based representation are identified for more abstract concepts (e.g. freedom). We further present a large-scale human semantics study, which reveals small-world semantic network topologies are context-dependent and
that scenes are the most dominant cognitive dimension. This finding leads us to conclude that there is no meaning without context. Lastly, scene2vec shows
promising human-like context-sensitive stereotypes (e.g. gender role bias), and we explore how such stereotypes are reduced by targeted debiasing. In conclusion, this thesis provides support for a novel computational
viewpoint on investigating meaning - scene-based grounded semantics. Future research scaling scene-based semantic models to human-levels through virtual grounding has the potential to unearth new insights into the human mind and
concurrently lead to advancements in artificial general intelligence by enabling robots, embodied or otherwise, to acquire and represent meaning directly from the environment
- β¦