37 research outputs found
Relational clustering models for knowledge discovery and recommender systems
Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining
(KDD). It aims at partitioning a given dataset into some homogeneous clusters so as
to reflect the natural hidden data structure. Various heuristic or statistical approaches
have been developed for analyzing propositional datasets. Nevertheless, in relational
clustering the existence of multi-type relationships will greatly degrade the performance
of traditional clustering algorithms. This issue motivates us to find more effective algorithms
to conduct the cluster analysis upon relational datasets. In this thesis we
comprehensively study the idea of Representative Objects for approximating data distribution
and then design a multi-phase clustering framework for analyzing relational
datasets with high effectiveness and efficiency.
The second task considered in this thesis is to provide some better data models for
people as well as machines to browse and navigate a dataset. The hierarchical taxonomy
is widely used for this purpose. Compared with manually created taxonomies, automatically
derived ones are more appealing because of their low creation/maintenance cost
and high scalability. Up to now, the taxonomy generation techniques are mainly used
to organize document corpus. We investigate the possibility of utilizing them upon relational
datasets and then propose some algorithmic improvements. Another non-trivial
problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize
the content of each node. Unfortunately, this field has not been investigated
sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing
some novel approaches.
The final goal of our cluster analysis and taxonomy generation techniques is
to improve the scalability of recommender systems that are developed to tackle the
problem of information overload. Recent research in recommender systems integrates
the exploitation of domain knowledge to improve the recommendation quality, which
however reduces the scalability of the whole system at the same time. We address this
issue by applying the automatically derived taxonomy to preserve the pair-wise similarities
between items, and then modeling the user visits by another hierarchical structure.
Experimental results show that the computational complexity of the recommendation
procedure can be greatly reduced and thus the system scalability be improved
A Topic Coverage Approach to Evaluation of Topic Models
Topic models are widely used unsupervised models of text capable of learning
topics - weighted lists of words and documents - from large collections of text
documents. When topic models are used for discovery of topics in text
collections, a question that arises naturally is how well the model-induced
topics correspond to topics of interest to the analyst. In this paper we
revisit and extend a so far neglected approach to topic model evaluation based
on measuring topic coverage - computationally matching model topics with a set
of reference topics that models are expected to uncover. The approach is well
suited for analyzing models' performance in topic discovery and for large-scale
analysis of both topic models and measures of model quality. We propose new
measures of coverage and evaluate, in a series of experiments, different types
of topic models on two distinct text domains for which interest for topic
discovery exists. The experiments include evaluation of model quality, analysis
of coverage of distinct topic categories, and the analysis of the relationship
between coverage and other methods of topic model evaluation. The contributions
of the paper include new measures of coverage, insights into both topic models
and other methods of model evaluation, and the datasets and code for
facilitating future research of both topic coverage and other approaches to
topic model evaluation.Comment: Results and contributions unchanged; Added new references; Improved
the contextualization and the description of the work (abstr, intro, 7.1
concl, rw, concl); Moved technical details of data and model building to
appendices; Improved layout
Image similarity in medical images
Recent experiments have indicated a strong influence of the substrate grain orientation on the self-ordering in anodic porous alumina. Anodic porous alumina with straight pore channels grown in a stable, self-ordered manner is formed on (001) oriented Al grain, while disordered porous pattern is formed on (101) oriented Al grain with tilted pore channels growing in an unstable manner. In this work, numerical simulation of the pore growth process is carried out to understand this phenomenon. The rate-determining step of the oxide growth is assumed to be the Cabrera-Mott barrier at the oxide/electrolyte (o/e) interface, while the substrate is assumed to determine the ratio β between the ionization and oxidation reactions at the metal/oxide (m/o) interface. By numerically solving the electric field inside a growing porous alumina during anodization, the migration rates of the ions and hence the evolution of the o/e and m/o interfaces are computed. The simulated results show that pore growth is more stable when β is higher. A higher β corresponds to more Al ionized and migrating away from the m/o interface rather than being oxidized, and hence a higher retained O:Al ratio in the oxide. Experimentally measured oxygen content in the self-ordered porous alumina on (001) Al is indeed found to be about 3% higher than that in the disordered alumina on (101) Al, in agreement with the theoretical prediction. The results, therefore, suggest that ionization on (001) Al substrate is relatively easier than on (101) Al, and this leads to the more stable growth of the pore channels on (001) Al
Relational clustering models for knowledge discovery and recommender systems
Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved.EThOS - Electronic Theses Online ServiceUniversity of WarwickUniversity of Warwick. Dept. of Computer ScienceGBUnited Kingdo
Free-hand Sketch Understanding and Analysis
PhDWith the proliferation of touch screens, sketching input has become popular among many software
products. This phenomenon has stimulated a new round of boom in free-hand sketch research,
covering topics like sketch recognition, sketch-based image retrieval, sketch synthesis
and sketch segmentation. Comparing to previous sketch works, the newly proposed works are
generally employing more complicated sketches and sketches in much larger quantity, thanks
to the advancements in hardware. This thesis thus demonstrates some new works on free-hand
sketches, presenting novel thoughts on aforementioned topics.
On sketch recognition, Eitz et al. [32] are the first explorers, who proposed the large-scale
TU-Berlin sketch dataset [32] that made sketch recognition possible. Following their work, we
continue to analyze the dataset and find that the visual cue sparsity and internal structural complexity
are the two biggest challenges for sketch recognition. Accordingly, we propose multiple
kernel learning [45] to fuse multiple visual cues and star graph representation [12] to encode the
structures of the sketches. With the new schemes, we have achieved significant improvement
on recognition accuracy (from 56% to 65.81%). Experimental study on sketch attributes is performed
to further boost sketch recognition performance and enable novel retrieval-by-attribute
applications.
For sketch-based image retrieval, we start by carefully examining the existing works. After
looking at the big picture of sketch-based image retrieval, we highlight that studying the sketch’s
ability to distinguish intra-category object variations should be the most promising direction to
proceed on, and we define it as the fine-grained sketch-based image retrieval problem. Deformable
part-based model which addresses object part details and object deformations is raised
to tackle this new problem, and graph matching is employed to compute the similarity between
deformable part-based models by matching the parts of different models. To evaluate this new
problem, we combine the TU-Berlin sketch dataset and the PASCAL VOC photo dataset [36] to
form a new challenging cross-domain dataset with pairwise sketch-photo similarity ratings, and
our proposed method has shown promising results on this new dataset. Regarding sketch synthesis, we focus on the generating of real free-hand style sketches for
general categories, as the closest previous work [8] only managed to show efficacy on a single
category: human faces. The difficulties that impede sketch synthesis to reach other categories
include the cluttered edges and diverse object variations due to deformation. To address those
difficulties, we propose a deformable stroke model to form the sketch synthesis into a detection
process, which is directly aiming at the cluttered background and the object variations. To alleviate
the training of such a model, a perceptual grouping algorithm is further proposed that
utilizes stroke length’s relationship to stroke semantics, stroke temporal order and Gestalt principles
[58] to perform part-level sketch segmentation. The perceptual grouping provides semantic
part-level supervision automatically for the deformable stroke model training, and an iterative
learning scheme is introduced to gradually refine the supervision and the model training. With
the learned deformable stroke models, sketches with distinct free-hand style can be generated for
many categories
On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters
This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p