170 research outputs found

    Semantic image retrieval using relevance feedback and transaction logs

    Get PDF
    Due to the recent improvements in digital photography and storage capacity, storing large amounts of images has been made possible, and efficient means to retrieve images matching a user’s query are needed. Content-based Image Retrieval (CBIR) systems automatically extract image contents based on image features, i.e. color, texture, and shape. Relevance feedback methods are applied to CBIR to integrate users’ perceptions and reduce the gap between high-level image semantics and low-level image features. The precision of a CBIR system in retrieving semantically rich (complex) images is improved in this dissertation work by making advancements in three areas of a CBIR system: input, process, and output. The input of the system includes a mechanism that provides the user with required tools to build and modify her query through feedbacks. Users behavioral in CBIR environments are studied, and a new feedback methodology is presented to efficiently capture users’ image perceptions. The process element includes image learning and retrieval algorithms. A Long-term image retrieval algorithm (LTL), which learns image semantics from prior search results available in the system’s transaction history, is developed using Factor Analysis. Another algorithm, a short-term learner (STL) that captures user’s image perceptions based on image features and user’s feedbacks in the on-going transaction, is developed based on Linear Discriminant Analysis. Then, a mechanism is introduced to integrate these two algorithms to one retrieval procedure. Finally, a retrieval strategy that includes learning and searching phases is defined for arranging images in the output of the system. The developed relevance feedback methodology proved to reduce the effect of human subjectivity in providing feedbacks for complex images. Retrieval algorithms were applied to images with different degrees of complexity. LTL is efficient in extracting the semantics of complex images that have a history in the system. STL is suitable for query and images that can be effectively represented by their image features. Therefore, the performance of the system in retrieving images with visual and conceptual complexities was improved when both algorithms were applied simultaneously. Finally, the strategy of retrieval phases demonstrated promising results when the query complexity increases

    Quality analyses and improvement for fuzzy clustering and web personalization

    Get PDF
    Web mining researchers and practitioners keep on innovating and creating new technologies to help web site managers efficiently improve their offered web-based services and to facilitate information retrieval by web site users. The increasing amount of information and services offered through the Web coupled with the increase in web-based transactions calls for systems that can handle gigantic amount of usage information efficiently while providing good predictions or recommendations and personalization of web sites. In this thesis we first focus on clustering to obtain usage model from weblog data and investigate ways to improve the clustering quality. We also consider applications and focus on generating predictions through collaborative filtering which matches behavior of a current user with that of past like-minded users. To provide dependable performance analysis and improve clustering quality, we study 4 fuzzy clustering algorithms and compare their effectiveness and efficiency in web prediction. Dependability aspects led us further to investigate objectivity of validity indices and choose a more objective index for assessing the relative performance of the clustering techniques. We also use appropriate statistical testing methods in our experiments to distinguish real differences from those that may be due to sampling or other errors. Our results reconfirm some of the claims made previously about these clustering and prediction techniques, while at the same time suggest the need to assess both cluster validation and prediction quality for a sound comparison of the clustering techniques. To assess quality of aggregate usage profiles (UP), we devised a set of criteria which reflect the semantic characterization of UPs and help avoid resorting to subjective human judgment in assessment of UPs and clustering quality. We formulate each of these criteria as a computable measure for individual as well as for groups of UPs. We applied these criteria in the final phase of fuzzy clustering. The soundness and usability of the criteria have been confirmed through a user survey

    Relational clustering models for knowledge discovery and recommender systems

    Get PDF
    Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved

    Relational clustering models for knowledge discovery and recommender systems

    Get PDF
    Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved.EThOS - Electronic Theses Online ServiceUniversity of WarwickUniversity of Warwick. Dept. of Computer ScienceGBUnited Kingdo

    PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

    Get PDF
    Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems

    Open Data

    Get PDF
    Open data is freely usable, reusable, or redistributable by anybody, provided there are safeguards in place that protect the data’s integrity and transparency. This book describes how data retrieved from public open data repositories can improve the learning qualities of digital networking, particularly performance and reliability. Chapters address such topics as knowledge extraction, Open Government Data (OGD), public dashboards, intrusion detection, and artificial intelligence in healthcare

    Acoustic data optimisation for seabed mapping with visual and computational data mining

    Get PDF
    Oceans cover 70% of Earth’s surface but little is known about their waters. While the echosounders, often used for exploration of our oceans, have developed at a tremendous rate since the WWII, the methods used to analyse and interpret the data still remain the same. These methods are inefficient, time consuming, and often costly in dealing with the large data that modern echosounders produce. This PhD project will examine the complexity of the de facto seabed mapping technique by exploring and analysing acoustic data with a combination of data mining and visual analytic methods. First we test the redundancy issues in multibeam echosounder (MBES) data by using the component plane visualisation of a Self Organising Map (SOM). A total of 16 visual groups were identified among the 132 statistical data descriptors. The optimised MBES dataset had 35 attributes from 16 visual groups and represented a 73% reduction in data dimensionality. A combined Principal Component Analysis (PCA) + k-means was used to cluster both the datasets. The cluster results were visually compared as well as internally validated using four different internal validation methods. Next we tested two novel approaches in singlebeam echosounder (SBES) data processing and clustering – using visual exploration for outlier detection and direct clustering of time series echo returns. Visual exploration identified further outliers the automatic procedure was not able to find. The SBES data were then clustered directly. The internal validation indices suggested the optimal number of clusters to be three. This is consistent with the assumption that the SBES time series represented the subsurface classes of the seabed. Next the SBES data were joined with the corresponding MBES data based on identification of the closest locations between MBES and SBES. Two algorithms, PCA + k-means and fuzzy c-means were tested and results visualised. From visual comparison, the cluster boundary appeared to have better definitions when compared to the clustered MBES data only. The results seem to indicate that adding SBES did in fact improve the boundary definitions. Next the cluster results from the analysis chapters were validated against ground truth data using a confusion matrix and kappa coefficients. For MBES, the classes derived from optimised data yielded better accuracy compared to that of the original data. For SBES, direct clustering was able to provide a relatively reliable overview of the underlying classes in survey area. The combined MBES + SBES data provided by far the best accuracy for mapping with almost a 10% increase in overall accuracy compared to that of the original MBES data. The results proved to be promising in optimising the acoustic data and improving the quality of seabed mapping. Furthermore, these approaches have the potential of significant time and cost saving in the seabed mapping process. Finally some future directions are recommended for the findings of this research project with the consideration that this could contribute to further development of seabed mapping problems at mapping agencies worldwide

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Classification non supervisée des données de hautes dimensions et extraction des connaissances dans les services WEB de question-réponse

    Get PDF
    Cette thèse à publication propose d'étudier deux problématiques différentes : 1) la classification non supervisée (clustering) des données de hautes dimensions, et 2) l'extraction des connaissances dans les services Web de question-réponse. Nos contributions sont présentées à travers trois chapitres. Dans le premier chapitre, nous proposons un algorithme de projected clustering nomme PCKA (Projected Clustering based on the K-means Algorithm). Contrairement à la vaste majorité des approches existantes, PCKA est capable de découvrir des structures de clusters qui existent dans différents sous-espaces de faibles dimensionnalités et ce en utilisant une mesure de similarité bien adaptée aux caractéristiques particulières des données multidimensionnelles. La fiabilité de PCKA est illustrée à travers des tests et des comparaisons avec les approches existantes sur une variété de données synthétiques et réelles. Le deuxième chapitre aborde le problème de l'identification des utilisateurs experts dans les forums Internet de question-réponse. Notre contribution inclut le développement d'une approche probabiliste qui se base sur le modèle de mélange de distributions de la loi Gamma. Notre approche permet de séparer, de façon systématique, les utilisateurs experts des non-experts alors que les approches existantes fournissent une liste ordonnée d'utilisateurs seulement. Le troisième chapitre étudie le problème de l'identification des communautés dans les forums Internet de question-réponse. Notre contribution inclut l'introduction du nouveau concept de "communauté de partage des connaissances". Ces communautés sont définies par les interactions entre les utilisateurs experts et non-experts. Pour identifier ce type de communauté nous représentons notre environnement sous la forme des données transactionnelles et nous proposons un algorithme de clustering nomme TRANCLUS (TRAnsaction CLUStering). Les clusters identifies par TRANCLUS représentent les communautés que nous cherchons à découvrir. Notre approche est validée sur des données extraites de plusieurs forums de Yahoo! Answers

    Predicting potential customer needs and wants for agile design and manufacture in an industry 4.0 environment

    Get PDF
    Manufacturing is currently experiencing a paradigm shift in the way that products are designed, produced and serviced. Such changes are brought about mainly by the extensive use of the Internet and digital technologies. As a result of this shift, a new industrial revolution is emerging, termed “Industry 4.0” (i4), which promises to accommodate mass customisation at a mass production cost. For i4 to become a reality, however, multiple challenges need to be addressed, highlighting the need for design for agile manufacturing and, for this, a framework capable of integrating big data analytics arising from the service end, business informatics through the manufacturing process, and artificial intelligence (AI) for the entire manufacturing value chain. This thesis attempts to address these issues, with a focus on the need for design for agile manufacturing. First, the state of the art in this field of research is reviewed on combining cutting-edge technologies in digital manufacturing with big data analysed to support agile manufacturing. Then, the work is focused on developing an AI-based framework to address one of the customisation issues in smart design and agile manufacturing, that is, prediction of potential customer needs and wants. With this framework, an AI-based approach is developed to predict design attributes that would help manufacturers to decide the best virtual designs to meet emerging customer needs and wants predictively. In particular, various machine learning approaches are developed to help explain at least 85% of the design variance when building a model to predict potential customer needs and wants. These approaches include k-means clustering, self-organizing maps, fuzzy k-means clustering, and decision trees, all supporting a vector machine to evaluate and extract conscious and subconscious customer needs and wants. A model capable of accurately predicting customer needs and wants for at least 85% of classified design attributes is thus obtained. Further, an analysis capable of determining the best design attributes and features for predicting customer needs and wants is also achieved. As the information analysed can be utilized to advise the selection of desired attributes, it is fed back in a closed-loop of the manufacturing value chain: design → manufacture → management/service → → → design... For this, a total of 4 case studies are undertaken to test and demonstrate the efficacy and effectiveness of the framework developed. These case studies include: 1) an evaluation model of consumer cars with multiple attributes including categorical and numerical ones; 2) specifications of automotive vehicles in terms of various characteristics including categorical and numerical instances; 3) fuel consumptions of various car models and makes, taking into account a desire for low fuel costs and low CO2 emissions; and 4) computer parts design for recommending the best design attributes when buying a computer. The results show that the decision trees, as a machine learning approach, work best in predicting customer needs and wants for smart design. With the tested framework and methodology, this thesis overall presents a holistic attempt to addressing the missing gap between manufacture and customisation, that is meeting customer needs and wants. Effective ways of achieving customization for i4 and smart manufacturing are identified. This is achieved through predicting potential customer needs and wants and applying the prediction at the product design stage for agile manufacturing to meet individual requirements at a mass production cost. Such agility is one key element in realising Industry 4.0. At the end, this thesis contributes to improving the process of analysing the data to predict potential customer needs and wants to be used as inputs to customizing product designs agilely
    • …
    corecore