49 research outputs found

    Sequential Pattern Mining with Multidimensional Interval Items

    Get PDF
    In real sequence pattern mining scenarios, the interval information between two item sets is very important. However, although existing algorithms can effectively mine frequent subsequence sets, the interval information is ignored. This paper aims to mine sequential patterns with multidimensional interval items in sequence databases. In order to address this problem, this paper defines and specifies the interval event problem in the sequential pattern mining task. Then, the interval event items framework is proposed to handle the multidimensional interval event items. Moreover, the MII-Prefixspan algorithm is introduced for the sequential pattern with multidimensional interval event items mining tasks. This algorithm adds the processing of interval event items in the mining process. We can get richer and more in line with actual needs information from mined sequence patterns through these methods. This scheme is applied to the actual website behaviour analysis task to obtain more valuable information for web optimization and provide more valuable sequence pattern information for practical problems. This work also opens a new pathway toward more efficient sequential pattern mining tasks

    A Survey of Sequential Pattern Based E-Commerce Recommendation Systems

    Get PDF
    E-commerce recommendation systems usually deal with massive customer sequential databases, such as historical purchase or click stream sequences. Recommendation systems’ accuracy can be improved if complex sequential patterns of user purchase behavior are learned by integrating sequential patterns of customer clicks and/or purchases into the user–item rating matrix input of collaborative filtering. This review focuses on algorithms of existing E-commerce recommendation systems that are sequential pattern-based. It provides a comprehensive and comparative performance analysis of these systems, exposing their methodologies, achievements, limitations, and potential for solving more important problems in this domain. The review shows that integrating sequential pattern mining of historical purchase and/or click sequences into a user–item matrix for collaborative filtering can (i) improve recommendation accuracy, (ii) reduce user–item rating data sparsity, (iii) increase the novelty rate of recommendations, and (iv) improve the scalability of recommendation systems

    Mining High Utility Patterns Over Data Streams

    Get PDF
    Mining useful patterns from sequential data is a challenging topic in data mining. An important task for mining sequential data is sequential pattern mining, which discovers sequences of itemsets that frequently appear in a sequence database. In sequential pattern mining, the selection of sequences is generally based on the frequency/support framework. However, most of the patterns returned by sequential pattern mining may not be informative enough to business people and are not particularly related to a business objective. In view of this, high utility sequential pattern (HUSP) mining has emerged as a novel research topic in data mining recently. The main objective of HUSP mining is to extract valuable and useful sequential patterns from data by considering the utility of a pattern that captures a business objective (e.g., profit, users interest). In HUSP mining, the goal is to find sequences whose utility in the database is no less than a user-specified minimum utility threshold. Nowadays, many applications generate a huge volume of data in the form of data streams. A number of studies have been conducted on mining HUSPs, but they are mainly intended for non-streaming data and thus do not take data stream characteristics into consideration. Mining HUSP from such data poses many challenges. First, it is infeasible to keep all streaming data in the memory due to the high volume of data accumulated over time. Second, mining algorithms need to process the arriving data in real time with one scan of data. Third, depending on the minimum utility threshold value, the number of patterns returned by a HUSP mining algorithm can be large and overwhelms the user. In general, it is hard for the user to determine the value for the threshold. Thus, algorithms that can find the most valuable patterns (i.e., top-k high utility patterns) are more desirable. Mining the most valuable patterns is interesting in both static data and data streams. To address these research limitations and challenges, this dissertation proposes techniques and algorithms for mining high utility sequential patterns over data streams. We work on mining HUSPs over both a long portion of a data stream and a short period of time. We also work on how to efficiently identify the most significant high utility patterns (namely, the top-k high utility patterns) over data streams. In the first part, we explore a fundamental problem that is how the limited memory space can be well utilized to produce high quality HUSPs over the entire data stream. An approximation algorithm, called MAHUSP, is designed which employs memory adaptive mechanisms to use a bounded portion of memory, to efficiently discover HUSPs over the entire data streams. The second part of the dissertation presents a new sliding window-based algorithm to discover recent high utility sequential patterns over data streams. A novel data structure named HUSP-Tree is proposed to maintain the essential information for mining recenT HUSPs. An efficient and single-pass algorithm named HUSP-Stream is proposed to generate recent HUSPs from HUSP-Tree. The third part addresses the problem of top-k high utility pattern mining over data streams. Two novel methods, named T-HUDS and T-HUSP, for finding top-k high utility patterns over a data stream are proposed. T-HUDS discovers top-k high utility itemsets and T-HUSP discovers top-k high utility sequential patterns over a data stream. T-HUDS is based on a compressed tree structure, called HUDS-Tree, that can be used to efficiently find potential top-k high utility itemsets over data streams. T-HUSP incrementally maintains the content of top-k HUSPs in a data stream in a summary data structure, named TKList, and discovers top-k HUSPs efficiently. All of the algorithms are evaluated using both synthetic and real datasets. The performances, including the running time, memory consumption, precision, recall and Fmeasure, are compared. In order to show the effectiveness and efficiency of the proposed methods in reallife applications, the fourth part of this dissertation presents applications of one of the proposed methods (i.e., MAHUSP) to extract meaningful patterns from a real web clickstream dataset and a real biosequence dataset. The utility-based sequential patterns are compared with the patterns in the frequency/support framework. The results show that high utility sequential pattern mining provides meaningful patterns in real-life applications

    IMPROVING RECOMMENDATION PERFORMANCE WITH USER INTEREST EVOLUTION PATTERNS

    Get PDF
    Effective recommendation is indispensable to customized or personalized services. Collaborative filtering approach is a salient technique to support automated recommendations, which relies on the profiles of customers to make recommendations to a target customer based on the neighbors with similar preferences. However, traditional collaborative recommendation techniques only use static information of customers’ preferences and ignore the evolution of their purchasing behaviours which contain valuable information for making recommendations. Thus, this study proposes an approach to increase the effectiveness of personalized recommendations by mining the sequence patterns from the evolving preferences of a target customer over time. The experimental results have shown that the proposed technique has improved the recommendation precision in comparison with collaborative filtering method based on Top k recommendation

    New Approach for Market Intelligence Using Artificial and Computational Intelligence

    Get PDF
    Small and medium sized retailers are central to the private sector and a vital contributor to economic growth, but often they face enormous challenges in unleashing their full potential. Financial pitfalls, lack of adequate access to markets, and difficulties in exploiting technology have prevented them from achieving optimal productivity. Market Intelligence (MI) is the knowledge extracted from numerous internal and external data sources, aimed at providing a holistic view of the state of the market and influence marketing related decision-making processes in real-time. A related, burgeoning phenomenon and crucial topic in the field of marketing is Artificial Intelligence (AI) that entails fundamental changes to the skillssets marketers require. A vast amount of knowledge is stored in retailers’ point-of-sales databases. The format of this data often makes the knowledge they store hard to access and identify. As a powerful AI technique, Association Rules Mining helps to identify frequently associated patterns stored in large databases to predict customers’ shopping journeys. Consequently, the method has emerged as the key driver of cross-selling and upselling in the retail industry. At the core of this approach is the Market Basket Analysis that captures knowledge from heterogeneous customer shopping patterns and examines the effects of marketing initiatives. Apriori, that enumerates frequent itemsets purchased together (as market baskets), is the central algorithm in the analysis process. Problems occur, as Apriori lacks computational speed and has weaknesses in providing intelligent decision support. With the growth of simultaneous database scans, the computation cost increases and results in dramatically decreasing performance. Moreover, there are shortages in decision support, especially in the methods of finding rarely occurring events and identifying the brand trending popularity before it peaks. As the objective of this research is to find intelligent ways to assist small and medium sized retailers grow with MI strategy, we demonstrate the effects of AI, with algorithms in data preprocessing, market segmentation, and finding market trends. We show with a sales database of a small, local retailer how our Åbo algorithm increases mining performance and intelligence, as well as how it helps to extract valuable marketing insights to assess demand dynamics and product popularity trends. We also show how this results in commercial advantage and tangible return on investment. Additionally, an enhanced normal distribution method assists data pre-processing and helps to explore different types of potential anomalies.Små och medelstora detaljhandlare är centrala aktörer i den privata sektorn och bidrar starkt till den ekonomiska tillväxten, men de möter ofta enorma utmaningar i att uppnå sin fulla potential. Finansiella svårigheter, brist på marknadstillträde och svårigheter att utnyttja teknologi har ofta hindrat dem från att nå optimal produktivitet. Marknadsintelligens (MI) består av kunskap som samlats in från olika interna externa källor av data och som syftar till att erbjuda en helhetssyn av marknadsläget samt möjliggöra beslutsfattande i realtid. Ett relaterat och växande fenomen, samt ett viktigt tema inom marknadsföring är artificiell intelligens (AI) som ställer nya krav på marknadsförarnas färdigheter. Enorma mängder kunskap finns sparade i databaser av transaktioner samlade från detaljhandlarnas försäljningsplatser. Ändå är formatet på dessa data ofta sådant att det inte är lätt att tillgå och utnyttja kunskapen. Som AI-verktyg erbjuder affinitetsanalys en effektiv teknik för att identifiera upprepade mönster som statistiska associationer i data lagrade i stora försäljningsdatabaser. De hittade mönstren kan sedan utnyttjas som regler som förutser kundernas köpbeteende. I detaljhandel har affinitetsanalys blivit en nyckelfaktor bakom kors- och uppförsäljning. Som den centrala metoden i denna process fungerar marknadskorgsanalys som fångar upp kunskap från de heterogena köpbeteendena i data och hjälper till att utreda hur effektiva marknadsföringsplaner är. Apriori, som räknar upp de vanligt förekommande produktkombinationerna som köps tillsammans (marknadskorgen), är den centrala algoritmen i analysprocessen. Trots detta har Apriori brister som algoritm gällande låg beräkningshastighet och svag intelligens. När antalet parallella databassökningar stiger, ökar också beräkningskostnaden, vilket har negativa effekter på prestanda. Dessutom finns det brister i beslutstödet, speciellt gällande metoder att hitta sällan förekommande produktkombinationer, och i att identifiera ökande popularitet av varumärken från trenddata och utnyttja det innan det når sin höjdpunkt. Eftersom målet för denna forskning är att hjälpa små och medelstora detaljhandlare att växa med hjälp av MI-strategier, demonstreras effekter av AI med hjälp av algoritmer i förberedelsen av data, marknadssegmentering och trendanalys. Med hjälp av försäljningsdata från en liten, lokal detaljhandlare visar vi hur Åbo-algoritmen ökar prestanda och intelligens i datautvinningsprocessen och hjälper till att avslöja värdefulla insikter för marknadsföring, framför allt gällande dynamiken i efterfrågan och trender i populariteten av produkterna. Ytterligare visas hur detta resulterar i kommersiella fördelar och konkret avkastning på investering. Dessutom hjälper den utvidgade normalfördelningsmetoden i förberedelsen av data och med att hitta olika slags anomalier

    A Taxonomy of Sequential Patterns Based Recommendation Systems

    Get PDF
    With remarkable expansion of information through the internet, users prefer to receive the exact information they need through some suggestions to save their time and money. Thus, recommendation systems have become the heart of business strategies of E-commerce as they can increase sales and revenue as well as customer loyalty. Recommendation systems techniques provide suggestions for items/products to be purchased, rented or used by a user. The most common type of recommendation system technique is Collaborative Filtering (CF), which takes user’s interest in an item (explicit rating) as input in a matrix known as the user-item rating matrix, and produces an output for unknown ratings of users for items from which top N recommended items for target users are defined. E-commerce recommendation systems usually deal with massive customer sequential databases such as historical purchase or click sequences. The time stamp of a click or purchase event is an important attribute of each dataset as the time interval between item purchases may be useful to learn the next items for purchase by users. Sequential Pattern Mining mines frequent or high utility sequential patterns from a sequential database. Recommendation systems accuracy will be improved if complex sequential patterns of user purchase behavior are learned by integrating sequential patterns of customer clicks and/or purchases into the user-item rating matrix input. Thus, integrating collaborative filtering (CF) and sequential pattern mining (SPM) of historical clicks and purchase data can improve recommendation accuracy, diversity and quality and this survey focuses on review of existing recommendation systems that are sequential pattern based exposing their methodologies, achievements, limitations, and potentials for solving more problems in this domain. This thesis provides a comprehensive and comparative study of the existing Sequential Pattern-based E-commerce recommendation systems (SP-based E-commerce RS) such as ChoRec05, ChenRec09, HuangRec09, LiuRec09, ChoiRec12, Hybrid Model RecSys16, Product RecSys16, SainiRec17, HPCRec18 and HSPCRec19. Thesis shows that integrating sequential patterns mining (SPM) of historical purchase and/or click sequences into user-item matrix for collaborative filtering (CF) (i) Improved recommendation accuracy (ii) Reduced limiting user-item rating data Sparsity (iii) Increased Novelty Rate of the recommendations and (iv) Improved Scalability of the recommendation system. Thus, the importance of sequential patterns of customer behavior in improving the quality of recommendation systems for the application domain of E-commerce is accentuated through this survey by having a comparative performance analysis of the surveyed systems

    Techniques for improving clustering and association rules mining from very large transactional databases

    Get PDF
    Clustering and association rules mining are two core data mining tasks that have been actively studied by data mining community for nearly two decades. Though many clustering and association rules mining algorithms have been developed, no algorithm is better than others on all aspects, such as accuracy, efficiency, scalability, adaptability and memory usage. While more efficient and effective algorithms need to be developed for handling the large-scale and complex stored datasets, emerging applications where data takes the form of streams pose new challenges for the data mining community. The existing techniques and algorithms for static stored databases cannot be applied to the data streams directly. They need to be extended or modified, or new methods need to be developed to process the data streams.In this thesis, algorithms have been developed for improving efficiency and accuracy of clustering and association rules mining on very large, high dimensional, high cardinality, sparse transactional databases and data streams.A new similarity measure suitable for clustering transactional data is defined and an incremental clustering algorithm, INCLUS, is proposed using this similarity measure. The algorithm only scans the database once and produces clusters based on the user’s expectations of similarities between transactions in a cluster, which is controlled by the user input parameters, a similarity threshold and a support threshold. Intensive testing has been performed to evaluate the effectiveness, efficiency, scalability and order insensitiveness of the algorithm.To extend INCLUS for transactional data streams, an equal-width time window model and an elastic time window model are proposed that allow mining of clustering changes in evolving data streams. The minimal width of the window is determined by the minimum clustering granularity for a particular application. Two algorithms, CluStream_EQ and CluStream_EL, based on the equal-width window model and the elastic window model respectively, are developed by incorporating these models into INCLUS. Each algorithm consists of an online micro-clustering component and an offline macro-clustering component. The online component writes summary statistics of a data stream to the disk, and the offline components uses those summaries and other user input to discover changes in a data stream. The effectiveness and scalability of the algorithms are evaluated by experiments.This thesis also looks into sampling techniques that can improve efficiency of mining association rules in a very large transactional database. The sample size is derived based on the binomial distribution and central limit theorem. The sample size used is smaller than that based on Chernoff Bounds, but still provides the same approximation guarantees. The accuracy of the proposed sampling approach is theoretically analyzed and its effectiveness is experimentally evaluated on both dense and sparse datasets.Applications of stratified sampling for association rules mining is also explored in this thesis. The database is first partitioned into strata based on the length of transactions, and simple random sampling is then performed on each stratum. The total sample size is determined by a formula derived in this thesis and the sample size for each stratum is proportionate to the size of the stratum. The accuracy of transaction size based stratified sampling is experimentally compared with that of random sampling.The thesis concludes with a summary of significant contributions and some pointers for further work

    Exploring fish purchasing behaviour using data analytics

    Get PDF
    Nas últimas décadas têm ocorrido mudanças significativas no setor do retalho resultantes da globalização, do aumento de competitividade e da transformação do comportamento de compra do consumidor. Esta mudança de paradigma também se aplica ao setor do peixe fresco, que tem sido alvo do interesse de investigadores internacionais por razões políticas e económicas. Tendo em conta este ambiente competitivo, que valoriza a qualidade e o serviço fornecido ao consumidor assente em custos aceitáveis, é necessário a adoção de estratégias focadas no cliente. Esta dissertação está integrada no projeto ValorMar, que nasceu do compromisso de um conjunto alargado de entidades, desde empresas até centros de investigação posicionados pela relevância da economia marítima na cadeia de valor do pescado. Assim, esta dissertação irá tentar compreender relações que se revelem críticas para a tomada de decisão dos consumidores no momento de compra de peixe fresco. Para tal, irão ser usados dados transacionais e técnicas de data mining adequadas ao problema.A metodologia proposta por esta dissertação tem como objetivo não só a identificação de clientes recorrendo a técnicas de segmentação, mas também uma análise ao carrinho de compras de um cliente de peixe fresco. Estas análises aos dados irão mostrar que a extração de conhecimento de grandes bases de dados permite melhorar as decisões estratégicas das empresas e a sua relação com os clientes.In the last decades there have been significant changes in the retail sector resulting from globalization, the increased competitiveness and transformation on consumer's purchasing behaviour. This paradigm shift also applies to the fish sector, that has been capturing the interest of researchers internationally for political and economic reasons. Taking this competitive environment into account, which values the quality and the service given to the customer based on acceptable costs, it is necessary to adopt customer focused strategies.This thesis is integrated in the ValorMar's project, which was born from the commitment of a broad spectrum of entities, from companies to research centers, positioned by the relevance of the sea economy in the fishery value chain. Thus, this dissertation will try to understand critical relations for the decision making of customers when buying fresh fish.For this, transactional data and data mining techniques appropriate to the problem will be used.The methodology proposed by this thesis aims not only to identify customers using clustering techniques, but also to analyze the market basket of a fresh fish customer. These data analyzis will show that the knowledge extraction from large databases allows to improve the companies strategic decisions and their relationship with customers

    User-Specific Bicluster-based Collaborative Filtering

    Get PDF
    Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2020Collaborative Filtering is one of the most popular and successful approaches for Recommender Systems. However, some challenges limit the effectiveness of Collaborative Filtering approaches when dealing with recommendation data, mainly due to the vast amounts of data and their sparse nature. In order to improve the scalability and performance of Collaborative Filtering approaches, several authors proposed successful approaches combining Collaborative Filtering with clustering techniques. In this work, we study the effectiveness of biclustering, an advanced clustering technique that groups rows and columns simultaneously, in Collaborative Filtering. When applied to the classic U-I interaction matrices, biclustering considers the duality relations between users and items, creating clusters of users who are similar under a particular group of items. We propose USBCF, a novel biclustering-based Collaborative Filtering approach that creates user specific models to improve the scalability of traditional CF approaches. Using a realworld dataset, we conduct a set of experiments to objectively evaluate the performance of the proposed approach, comparing it against baseline and state-of-the-art Collaborative Filtering methods. Our results show that the proposed approach can successfully suppress the main limitation of the previously proposed state-of-the-art biclustering-based Collaborative Filtering (BBCF) since BBCF can only output predictions for a small subset of the system users and item (lack of coverage). Moreover, USBCF produces rating predictions with quality comparable to the state-of-the-art approaches
    corecore