12 research outputs found

    Frequent-pattern based iterative projected clustering

    Get PDF
    Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining frequent itemsets and present heuristics that improve its quality. Our techniques are evaluated with synthetic and real data; they are scalable and discover projected clusters accurately. © 2003 IEEE.published_or_final_versio

    Frequent-pattern based iterative projected clustering

    Get PDF
    Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining frequent itemsets and present heuristics that improve its quality. Our techniques are evaluated with synthetic and real data; they are scalable and discover projected clusters accurately. © 2003 IEEE.published_or_final_versio

    Path Clustering: Grouping in a Efficient Way Complex Data Distributions

    Get PDF
    This work proposes an algorithm that uses paths based on tile segmentation to build complex clusters. After allocating data items (points) to geometric shapes in tile format, the complexity of our algorithm is related to the number of tiles instead of the number of points. The main novelty is the way our algorithm goes through the grids, saving time and providing good results. It does not demand any configuration parameters from users, making easier to use than other strategies. Besides, the algorithm does not create overlapping clusters, which simplifies the interpretation of results

    Ομαδικές συστάσεις βάσει περίπτωσης για διαμορφώσιμα προϊόντα με χρήση πολυδιάστατης ομαδοποίησης

    Get PDF
    Τα συστήματα συστάσεων παρέχουν εξατομικευμένες προτάσεις στους χρήστες σχετικά με αντικείμενα ή θέματα που εκτιμάται ότι θα τους ενδιαφέρουν. Τα μοντέρνα συστήματα συστάσεων υποστηρίζουν τους χρήστες στην επιλογή αντικειμένων ενός συγκεκριμένου είδους (για παράδειγμα, ταινίες, βιβλία και τραγούδια). Η παρούσα εργασία επικεντρώνεται σε ένα σχετικά νέο τομέα συστημάτων συστάσεων που αφορά τα διαμορφώσιμα προϊόντα (configurable products) τα οποία αποτελούνται από επιμέρους αντικείμενα και προτείνονται στο χρήστη ως σύνολο, όπως είναι για παράδειγμα ένας H/Y. Συνήθως, τα συστήματα συστάσεων επωφελούνται των τεχνικών του συνεργατικού φιλτραρίσματος (ΣΦ) που προβλέπουν αντικείμενα για το νέο χρήστη με βάση τις προτιμήσεις άλλων όμοιων χρηστών. Εκτός από το συνεργατικό φιλτράρισμα, τα συστήματα συστάσεων χρησιμοποιούν επίσης άλλες τεχνικές μηχανικής μάθησης όπως ομαδοποίηση (clustering) και κατηγοριοποίηση (classification) δεδομένων. Η παρούσα διπλωματική εργασία στοχεύει στην πρόταση μιας νέας αποδοτικής τεχνικής συστάσεων ανασχηματιζόμενων προϊόντων για ομάδες χρηστών. Προτείνεται η δημιουργία ενός υβριδικού συστήματος συστάσεων ΣΦ και συστάσεων βάσει περίπτωσης (case-based) το οποίο θα προτείνει διαμορφώσιμα προϊόντα σε ομάδες χρηστών μέσω της υιοθέτησης τεχνικών πολυδιάστατης ομαδοποίησης και κατηγοριοποίησης. Ειδικότερα, χρησιμοποιούμε τα δημογραφικά δεδομένα και τις προτιμήσεις των χρηστών για να τους ομαδοποιήσουμε σε πολλαπλές κατηγορίες και στη συνέχεια δημιουργούμε ένα μοντέλο που εντάσσει το νέο χρήστη σε μία από αυτές. Οι νέοι χρήστες ομαδοποιούνται βάσει κατηγορίας και οι συστάσεις παρέχονται στην ομάδα βάσει των διαμορφώσεων εγγεγραμμένων χρηστών που οι προτιμήσεις τους μοιάζουν περισσότερο με της εκάστοτε ομάδας. Η πειραματική αξιολόγηση αποδεικνύει ότι η ενσωμάτωση της πολυδιάστατης ομαδοποίησης βελτιώνει την ακρίβεια των συστάσεων. Παράλληλα, αντιμετωπίζει τα κυριότερα προβλήματα των τεχνικών ΣΦ που είναι η αραιότητα των αξιολογήσεων και το πρόβλημα της ψυχρής εκκίνησης.Recommender systems provide personalized suggestions to end users regarding items or concepts that they will probably find interesting. Modern recommenders help users to select items of a specific kind, for instance films, books or songs. This thesis focuses on a relatively new field of recommender systems concerning configurable products which consist of individual attributes or parts. These parts are recommended to the user as a whole, for example a customizable PC. Usually, recommenders leverage collaborative filtering methods that predict items for new users based on the preferences of other similar users. Apart from collaborative filtering, recommenders are likely to use other techniques common in data mining such as clustering and classification of data. The aim of this thesis is to propose an effective approach for recommendation of configurable products for groups of users. We suggest and describe the creation of a hybrid collaborative filtering and case-based recommender system, which will propose configurations to groups by applying multidimensional clustering and classification algorithms. Specifically, we use demographic data and users’ preferences to cluster them in multiple classes and then we create the model which classifies the new user into one of these classes. New users are grouped by class and recommendations are provided to each group based on the configurations of registered users whose preferences are more similar to the group’s aggregated preferences. Experimental evaluation of the aforementioned system proves that the integration of multidimensional clustering improves the precision of the recommendations. At the same time, it deals with the major problems of collaborative filtering approaches, which are the sparseness of rankings (for new items) and the cold start problem (for new users)

    ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery

    Get PDF
    Il contributo di questa tesi è il disegno e lo sviluppo di un sistema di Knoledge Discovery denominato ConQueSt. Basato sul paradigma del Pattern Discovery guidato dai vincoli, ConQueSt segue la visione dell’Inductive Database: • il mining è visto come forma più complessa di querying, • il sistema quindi è equipaggiato con un data mining query language, e strettamente collegato con un DBMS • i pattern estratti con query di mining diventano cittadini di prima classe e, seguendo il principio di chiusura, vengono materializzati accanto ai dati nel DBMS. ConQueSt è già stato presentato con successo al workshop internazionale della comunità IDB, e alla prestigiosa conferenza IEEE International Conference on Data Mining Engineering (ICDE 2006). A giugno sarà presentato alla conferenaz italiana di basi di dati (SEBD 2006). E’ attualmente in corso la sottomissione ad una prestigiosa rivista

    Improving k-nn search and subspace clustering based on local intrinsic dimensionality

    Get PDF
    In several novel applications such as multimedia and recommender systems, data is often represented as object feature vectors in high-dimensional spaces. The high-dimensional data is always a challenge for state-of-the-art algorithms, because of the so-called curse of dimensionality . As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where many data analysis algorithms, such as similarity search and clustering, that depend on them lose their effectiveness. One way to handle this challenge is by selecting the most important features, which is essential for providing compact object representations as well as improving the overall search and clustering performance. Having compact feature vectors can further reduce the storage space and the computational complexity of search and learning tasks. Support-Weighted Intrinsic Dimensionality (support-weighted ID) is a new promising feature selection criterion that estimates the contribution of each feature to the overall intrinsic dimensionality. Support-weighted ID identifies relevant features locally for each object, and penalizes those features that have locally lower discriminative power as well as higher density. In fact, support-weighted ID measures the ability of each feature to locally discriminate between objects in the dataset. Based on support-weighted ID, this dissertation introduces three main research contributions: First, this dissertation proposes NNWID-Descent, a similarity graph construction method that utilizes the support-weighted ID criterion to identify and retain relevant features locally for each object and enhance the overall graph quality. Second, with the aim to improve the accuracy and performance of cluster analysis, this dissertation introduces k-LIDoids, a subspace clustering algorithm that extends the utility of support-weighted ID within a clustering framework in order to gradually select the subset of informative and important features per cluster. k-LIDoids is able to construct clusters together with finding a low dimensional subspace for each cluster. Finally, using the compact object and cluster representations from NNWID-Descent and k-LIDoids, this dissertation defines LID-Fingerprint, a new binary fingerprinting and multi-level indexing framework for the high-dimensional data. LID-Fingerprint can be used for hiding the information as a way of preventing passive adversaries as well as providing an efficient and secure similarity search and retrieval for the data stored on the cloud. When compared to other state-of-the-art algorithms, the good practical performance provides an evidence for the effectiveness of the proposed algorithms for the data in high-dimensional spaces

    Unsupervised learning on social data

    Get PDF
    corecore