5,287 research outputs found

    Combining privileged information to improve context-aware recommender systems

    Get PDF
    A recommender system is an information filtering technology which can be used to predict preference ratings of items (products, services, movies, etc) and/or to output a ranking of items that are likely to be of interest to the user. Context-aware recommender systems (CARS) learn and predict the tastes and preferences of users by incorporating available contextual information in the recommendation process. One of the major challenges in context-aware recommender systems research is the lack of automatic methods to obtain contextual information for these systems. Considering this scenario, in this paper, we propose to use contextual information from topic hierarchies of the items (web pages) to improve the performance of context-aware recommender systems. The topic hierarchies are constructed by an extension of the LUPI-based Incremental Hierarchical Clustering method that considers three types of information: traditional bag-of-words (technical information), and the combination of named entities (privileged information I) with domain terms (privileged information II). We evaluated the contextual information in four context-aware recommender systems. Different weights were assigned to each type of information. The empirical results demonstrated that topic hierarchies with the combination of the two kinds of privileged information can provide better recommendations.FAPESP (grant #2010/20564-8, #2012/13830-9, and #2013/16039-3, São Paulo Research Foundation (FAPESP))CAPE

    Hierarchical Co-Clustering: Off-line and Incremental Approaches

    Get PDF
    International audienceClustering data is challenging especially for two reasons. The dimensionality of the data is often very high which makes the cluster interpretation hard. Moreover, with high-dimensional data the classic metrics fail in identifying the real similarities between objects. The second challenge is the evolving nature of the observed phenomena which makes the datasets accumulating over time. In this paper we show how we propose to solve these problems. To tackle the high-dimensionality problem, we propose to apply a co-clustering approach on the dataset that stores the occurrence of features in the observed objects. Co-clustering computes a partition of objects and a partition of features simultaneously. The novelty of our co-clustering solution is that it arranges the clusters in a hierarchical fashion, and it consists of two hierarchies: one on the objects and one on the features. The two hierarchies are coupled because the clusters at a certain level in one hierarchy are coupled with the clusters at the same level of the other hierarchy and form the co-clusters. Each cluster of one of the two hierarchies thus provides insights on the clusters of the other hierarchy. Another novelty of the proposed solution is that the number of clusters is possibly unlimited. Nevertheless, the produced hierarchies are still compact and therefore more readable because our method allows multiple splits of a cluster at the lower level. As regards the second challenge, the accumulating nature of the data makes the datasets intractably huge over time. In this case, an incremental solution relieves the issue because it partitions the problem. In this paper we introduce an incremental version of our algorithm of hierarchical co-clustering. It starts from an intermediate solution computed on the previous version of the data and it updates the co-clustering results considering only the added block of data. This solution has the merit of speeding up the computation with respect to the original approach that would recompute the result on the overall dataset. In addition, the incremental algorithm guarantees approximately the same answer than the original version, but it saves much computational load. We validate the incremental approach on several high-dimensional datasets and perform an accurate comparison with both the original version of our algorithm and with the state of the art competitors as well. The obtained results open the way to a novel usage of the co-clustering algorithms in which it is advantageous to partition the data into several blocks and process them incrementally thus "incorporating" data gradually into an on-going co-clustering solutio

    Modelling the acquisition of natural language categories

    Get PDF
    The ability to reason about categories and category membership is fundamental to human cognition, and as a result a considerable amount of research has explored the acquisition and modelling of categorical structure from a variety of perspectives. These range from feature norming studies involving adult participants (McRae et al. 2005) to long-term infant behavioural studies (Bornstein and Mash 2010) to modelling experiments involving artificial stimuli (Quinn 1987). In this thesis we focus on the task of natural language categorisation, modelling the cognitively plausible acquisition of semantic categories for nouns based on purely linguistic input. Focusing on natural language categories and linguistic input allows us to make use of the tools of distributional semantics to create high-quality representations of meaning in a fully unsupervised fashion, a property not commonly seen in traditional studies of categorisation. We explore how natural language categories can be represented using distributional models of semantics; we construct concept representations for corpora and evaluate their performance against psychological representations based on human-produced features, and show that distributional models can provide a high-quality substitute for equivalent feature representations. Having shown that corpus-based concept representations can be used to model category structure, we turn our focus to the task of modelling category acquisition and exploring how category structure evolves over time. We identify two key properties necessary for cognitive plausibility in a model of category acquisition, incrementality and non-parametricity, and construct a pair of models designed around these constraints. Both models are based on a graphical representation of semantics in which a category represents a densely connected subgraph. The first model identifies such subgraphs and uses these to extract a flat organisation of concepts into categories; the second uses a generative approach to identify implicit hierarchical structure and extract an hierarchical category organisation. We compare both models against existing methods of identifying category structure in corpora, and find that they outperform their counterparts on a variety of tasks. Furthermore, the incremental nature of our models allows us to predict the structure of categories during formation and thus to more accurately model category acquisition, a task to which batch-trained exemplar and prototype models are poorly suited

    Exploiting Homogeneity of Density in Incremental Hierarchical Clustering

    Full text link
    Hierarchical clustering is an important tool in many applications. As it involves a large data set that proliferates over time, reclustering the data set periodically is not an efficient process. Therefore, the ability to incorporate a new data set incrementally into an existing hierarchy becomes increasingly demanding. This article describes HOMOGEN, a system that employs a new algorithm for generating a hierarchy of concepts and clusters incrementally from a stream of observations. The system aims to construct a hierarchy that satisfies the homogeneity and the monotonicity properties. Working in a bottom-up fashion, a new observation is placed in the hierarchy and a sequence of hierarchy restructuring processes is performed only in regions that have been affected by the presence of the new observation. Additionally, it combines multiple restructuring techniques that address different restructuring objectives to get a synergistic effect. The system has been tested on a variety of domains including structured and unstructured data sets. The experimental results reveal that the system is able to construct a concept hierarchy that is consistent regardless of the input data order and whose quality is comparable to the quality of those produced by non incremental clustering algorithms

    Exploiting Homogeneity of Density in Incremental Hierarchical Clustering

    Full text link

    Exploiting Homogeneity of Density in Incremental Hierarchical Clustering

    Get PDF
    Hierarchical clustering is an important tool in many applications. As it involves a large data set that proliferates over time, reclustering the data set periodically is not an efficient process. Therefore, the ability to incorporate a new data set incrementally into an existing hierarchy becomes increasingly demanding. This article describes HOMOGEN, a system that employs a new algorithm for generating a hierarchy of concepts and clusters incrementally from a stream of observations. The system aims to construct a hierarchy that satisfies the homogeneity and the monotonicity properties. Working in a bottom-up fashion, a new observation is placed in the hierarchy and a sequence of hierarchy restructuring processes is performed only in regions that have been affected by the presence of the new observation. Additionally, it combines multiple restructuring techniques that address different restructuring objectives to get a synergistic effect. The system has been tested on a variety of domains including structured and unstructured data sets. The experimental results reveal that the system is able to construct a concept hierarchy that is consistent regardless of the input data order and whose quality is comparable to the quality of those produced by non incremental clustering algorithms

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Applying multi-view based metadata in personalized ranking for recommender systems

    Get PDF
    In this paper, we propose a multi-view based metadata extraction technique from unstructured textual content in order to be applied in recommendation algorithms based on latent factors. The solution aims at reducing the problem of intense and time-consuming human effort to identify, collect and label descriptions about the items. Our proposal uses a unsupervised learning method to construct topic hierarchies with named entity recognition as privileged information. We evaluate the technique using different recommendation algorithms, and show that better accuracy is obtained when additional information about items is considered.São Paulo Research Foundation (FAPESP) (Grants 2012/13830-9, 2013/16039-3, 2013/22547-1)CAPE
    • …
    corecore