93,567 research outputs found

    A review of associative classification mining

    Get PDF
    Associative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper

    Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce

    Full text link
    In this paper, we present our work towards comparing on-line and off-line evaluation metrics in the context of small e-commerce recommender systems. Recommending on small e-commerce enterprises is rather challenging due to the lower volume of interactions and low user loyalty, rarely extending beyond a single session. On the other hand, we usually have to deal with lower volumes of objects, which are easier to discover by users through various browsing/searching GUIs. The main goal of this paper is to determine applicability of off-line evaluation metrics in learning true usability of recommender systems (evaluated on-line in A/B testing). In total 800 variants of recommending algorithms were evaluated off-line w.r.t. 18 metrics covering rating-based, ranking-based, novelty and diversity evaluation. The off-line results were afterwards compared with on-line evaluation of 12 selected recommender variants and based on the results, we tried to learn and utilize an off-line to on-line results prediction model. Off-line results shown a great variance in performance w.r.t. different metrics with the Pareto front covering 68\% of the approaches. Furthermore, we observed that on-line results are considerably affected by the novelty of users. On-line metrics correlates positively with ranking-based metrics (AUC, MRR, nDCG) for novice users, while too high values of diversity and novelty had a negative impact on the on-line results for them. For users with more visited items, however, the diversity became more important, while ranking-based metrics relevance gradually decrease.Comment: Submitted to ACM Hypertext 2020 Conferenc

    Surrogate Functions for Maximizing Precision at the Top

    Full text link
    The problem of maximizing precision at the top of a ranked list, often dubbed Precision@k (prec@k), finds relevance in myriad learning applications such as ranking, multi-label classification, and learning with severe label imbalance. However, despite its popularity, there exist significant gaps in our understanding of this problem and its associated performance measure. The most notable of these is the lack of a convex upper bounding surrogate for prec@k. We also lack scalable perceptron and stochastic gradient descent algorithms for optimizing this performance measure. In this paper we make key contributions in these directions. At the heart of our results is a family of truly upper bounding surrogates for prec@k. These surrogates are motivated in a principled manner and enjoy attractive properties such as consistency to prec@k under various natural margin/noise conditions. These surrogates are then used to design a class of novel perceptron algorithms for optimizing prec@k with provable mistake bounds. We also devise scalable stochastic gradient descent style methods for this problem with provable convergence bounds. Our proofs rely on novel uniform convergence bounds which require an in-depth analysis of the structural properties of prec@k and its surrogates. We conclude with experimental results comparing our algorithms with state-of-the-art cutting plane and stochastic gradient algorithms for maximizing [email protected]: To appear in the the proceedings of the 32nd International Conference on Machine Learning (ICML 2015

    Information filtering via preferential diffusion

    Get PDF
    Recommender systems have shown great potential to address information overload problem, namely to help users in finding interesting and relevant objects within a huge information space. Some physical dynamics, including heat conduction process and mass or energy diffusion on networks, have recently found applications in personalized recommendation. Most of the previous studies focus overwhelmingly on recommendation accuracy as the only important factor, while overlook the significance of diversity and novelty which indeed provide the vitality of the system. In this paper, we propose a recommendation algorithm based on the preferential diffusion process on user-object bipartite network. Numerical analyses on two benchmark datasets, MovieLens and Netflix, indicate that our method outperforms the state-of-the-art methods. Specifically, it can not only provide more accurate recommendations, but also generate more diverse and novel recommendations by accurately recommending unpopular objects.Comment: 12 pages, 10 figures, 2 table

    Sound ranking algorithms for XML search

    Get PDF
    Ranking algorithms for XML should reflect the actual combined content and structure constraints of queries, while at the same time producing equal rankings for queries that are semantically equal. Ranking algorithms that produce different rankings for queries that are semantically equal are easily detected by tests on large databases: We call such algorithms not sound. We report the behavior of different approaches to ranking content-and-structure queries on pairs of queries for which we expect equal ranking results from the query semantics. We show that most of these approaches are not sound. Of the remaining approaches, only 3 adhere to the W3C XQuery Full-Text standard
    • …
    corecore