37 research outputs found

    Relational clustering models for knowledge discovery and recommender systems

    Get PDF
    Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved

    A Topic Coverage Approach to Evaluation of Topic Models

    Full text link
    Topic models are widely used unsupervised models of text capable of learning topics - weighted lists of words and documents - from large collections of text documents. When topic models are used for discovery of topics in text collections, a question that arises naturally is how well the model-induced topics correspond to topics of interest to the analyst. In this paper we revisit and extend a so far neglected approach to topic model evaluation based on measuring topic coverage - computationally matching model topics with a set of reference topics that models are expected to uncover. The approach is well suited for analyzing models' performance in topic discovery and for large-scale analysis of both topic models and measures of model quality. We propose new measures of coverage and evaluate, in a series of experiments, different types of topic models on two distinct text domains for which interest for topic discovery exists. The experiments include evaluation of model quality, analysis of coverage of distinct topic categories, and the analysis of the relationship between coverage and other methods of topic model evaluation. The contributions of the paper include new measures of coverage, insights into both topic models and other methods of model evaluation, and the datasets and code for facilitating future research of both topic coverage and other approaches to topic model evaluation.Comment: Results and contributions unchanged; Added new references; Improved the contextualization and the description of the work (abstr, intro, 7.1 concl, rw, concl); Moved technical details of data and model building to appendices; Improved layout

    Multi-Source Spatial Entity Extraction and Linkage

    Get PDF

    Image similarity in medical images

    Get PDF
    Recent experiments have indicated a strong influence of the substrate grain orientation on the self-ordering in anodic porous alumina. Anodic porous alumina with straight pore channels grown in a stable, self-ordered manner is formed on (001) oriented Al grain, while disordered porous pattern is formed on (101) oriented Al grain with tilted pore channels growing in an unstable manner. In this work, numerical simulation of the pore growth process is carried out to understand this phenomenon. The rate-determining step of the oxide growth is assumed to be the Cabrera-Mott barrier at the oxide/electrolyte (o/e) interface, while the substrate is assumed to determine the ratio β between the ionization and oxidation reactions at the metal/oxide (m/o) interface. By numerically solving the electric field inside a growing porous alumina during anodization, the migration rates of the ions and hence the evolution of the o/e and m/o interfaces are computed. The simulated results show that pore growth is more stable when β is higher. A higher β corresponds to more Al ionized and migrating away from the m/o interface rather than being oxidized, and hence a higher retained O:Al ratio in the oxide. Experimentally measured oxygen content in the self-ordered porous alumina on (001) Al is indeed found to be about 3% higher than that in the disordered alumina on (101) Al, in agreement with the theoretical prediction. The results, therefore, suggest that ionization on (001) Al substrate is relatively easier than on (101) Al, and this leads to the more stable growth of the pore channels on (001) Al

    Image similarity in medical images

    Get PDF

    Relational clustering models for knowledge discovery and recommender systems

    Get PDF
    Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved.EThOS - Electronic Theses Online ServiceUniversity of WarwickUniversity of Warwick. Dept. of Computer ScienceGBUnited Kingdo

    Free-hand Sketch Understanding and Analysis

    Get PDF
    PhDWith the proliferation of touch screens, sketching input has become popular among many software products. This phenomenon has stimulated a new round of boom in free-hand sketch research, covering topics like sketch recognition, sketch-based image retrieval, sketch synthesis and sketch segmentation. Comparing to previous sketch works, the newly proposed works are generally employing more complicated sketches and sketches in much larger quantity, thanks to the advancements in hardware. This thesis thus demonstrates some new works on free-hand sketches, presenting novel thoughts on aforementioned topics. On sketch recognition, Eitz et al. [32] are the first explorers, who proposed the large-scale TU-Berlin sketch dataset [32] that made sketch recognition possible. Following their work, we continue to analyze the dataset and find that the visual cue sparsity and internal structural complexity are the two biggest challenges for sketch recognition. Accordingly, we propose multiple kernel learning [45] to fuse multiple visual cues and star graph representation [12] to encode the structures of the sketches. With the new schemes, we have achieved significant improvement on recognition accuracy (from 56% to 65.81%). Experimental study on sketch attributes is performed to further boost sketch recognition performance and enable novel retrieval-by-attribute applications. For sketch-based image retrieval, we start by carefully examining the existing works. After looking at the big picture of sketch-based image retrieval, we highlight that studying the sketch’s ability to distinguish intra-category object variations should be the most promising direction to proceed on, and we define it as the fine-grained sketch-based image retrieval problem. Deformable part-based model which addresses object part details and object deformations is raised to tackle this new problem, and graph matching is employed to compute the similarity between deformable part-based models by matching the parts of different models. To evaluate this new problem, we combine the TU-Berlin sketch dataset and the PASCAL VOC photo dataset [36] to form a new challenging cross-domain dataset with pairwise sketch-photo similarity ratings, and our proposed method has shown promising results on this new dataset. Regarding sketch synthesis, we focus on the generating of real free-hand style sketches for general categories, as the closest previous work [8] only managed to show efficacy on a single category: human faces. The difficulties that impede sketch synthesis to reach other categories include the cluttered edges and diverse object variations due to deformation. To address those difficulties, we propose a deformable stroke model to form the sketch synthesis into a detection process, which is directly aiming at the cluttered background and the object variations. To alleviate the training of such a model, a perceptual grouping algorithm is further proposed that utilizes stroke length’s relationship to stroke semantics, stroke temporal order and Gestalt principles [58] to perform part-level sketch segmentation. The perceptual grouping provides semantic part-level supervision automatically for the deformable stroke model training, and an iterative learning scheme is introduced to gradually refine the supervision and the model training. With the learned deformable stroke models, sketches with distinct free-hand style can be generated for many categories

    On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters

    Get PDF
    This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p
    corecore