9 research outputs found

    Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model

    Get PDF
    AbstractFrequent itemset mining from data streams is an important data mining problem with broad applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. However, it is also a difficult problem due to the unbounded, high-speed and continuous characteristics of streaming data. Therefore, extracting frequent itemsets from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient algorithm, called Max-FISM (Maximal-Frequent Itemsets Mining), for mining recent maximal frequent itemsets from a high-speed stream of transactions within a sliding window. According to our algorithm, whenever a new transaction is inserted in the current window only its maximum itemset should be inserted into a prefix tree-based summary data structure called Max-Set for maintaining the number of independent appearance of each transaction in the current window. Finally, the set of recent maximal frequent itemsets is obtained from the current Max-Set. Experimental studies show that the proposed Max-FISM algorithm is highly efficient in terms of memory and time complexity for mining recent maximal frequent itemsets over high-speed data streams

    Distributed Frequent Item Sets Mining over P2P Networks

    Get PDF
    Data intensive peer-to-peer (P2P) networks are becoming increasingly popular in applications like social networking, file sharing networks, etc. Data mining in such P2P environments is the new generation of advanced P2P applications. Unfortunately, most of the existing data mining algorithms do not fit well in such environments since they require data that can be accessed in its entirety. It also is not easy due to the requirements of online transactional data streams. In this paper, we have developed a local algorithm for tracing frequent item sets over a P2P network. The performance of the proposed algorithm is comparatively tested and analyzed through a series of experiments

    Collaborative Planning and Event Monitoring Over Supply Chain Network

    Get PDF
    The shifting paradigm of supply chain management is manifesting increasing reliance on automated collaborative planning and event monitoring through information-bounded interaction across organizations. An end-to-end support for the course of actions is turning vital in faster incident response and proactive decision making. Many current platforms exhibit limitations to handle supply chain planning and monitoring in decentralized setting where participants may divide their responsibilities and share computational load of the solution generation. In this thesis, we investigate modeling and solution generation techniques for shared commodity delivery planning and event monitoring problems in a collaborative setting. In particular, we first elaborate a new model of Multi-Depot Vehicle Routing Problem (MDVRP) to jointly serve customer demands using multiple vehicles followed by a heuristic technique to search near-optimal solutions for such problem instances. Secondly, we propose two distributed mechanisms, namely: Passive Learning and Active Negotiation, to find near-optimal MDVRP solutions while executing the heuristic algorithm at the participant's side. Thirdly, we illustrate a collaboration mechanism to cost-effectively deploy execution monitors over supply chain network in order to collect in-field plan execution data. Finally, we describe a distributed approach to collaboratively monitor associations among recent events from an incoming stream of plan execution data. Experimental results over known datasets demonstrate the efficiency of the approaches to handle medium and large problem instances. The work has also produced considerable knowledge on the collaborative transportation planning and execution event monitoring

    Mining Meaning from Text by Harvesting Frequent and Diverse Semantic Itemsets

    Get PDF
    Abstract. In this paper, we present a novel and completely-unsupervised approach to unravel meanings (or senses) from linguistic constructions found in large corpora by introducing the concept of semantic vector. A semantic vector is a space-transformed vector where features repre-sent fine-grained semantic information units, instead of values of co-occurrences within a collection of texts. More in detail, instead of seeing words as vectors of frequency values, we propose to first explode words into a multitude of tiny semantic information retrieved from existing re-sources like WordNet and ConceptNet, and then clustering them into frequent and diverse patterns. This way, on the one hand, we are able to model linguistic data with a larger but much more dense and informa-tive semantic feature space. On the other hand, being the model based on basic and conceptual information, we are also able to generate new data by querying the above-mentioned semantic resources with the fea-tures contained in the extracted patterns. We experimented the idea on a dataset of 640 millions of triples subject-verb-object to automatically inducing senses for specific input verbs, demonstrating the validity and the potential of the presented approach in modeling and understanding natural language

    Modelling Web Usage in a Changing Environment

    Get PDF
    Eiben, A.E. [Promotor]Kowalczyk, W. [Copromotor

    A Rule Mining-Based Advanced Persistent Threats Detection System

    Get PDF

    Enhancing the Prediction of Missing Targeted Items from the Transactions of Frequent, Known Users

    Get PDF
    The ability for individual grocery retailers to have a single view of its customers across all of their grocery purchases remains elusive, and is considered the “holy grail” of grocery retailing. This has become increasingly important in recent years, especially in the UK, where competition has intensified, shopping habits and demographics have changed, and price sensitivity has increased. Whilst numerous studies have been conducted on understanding independent items that are frequently bought together, there has been little research conducted on using this knowledge of frequent itemsets to support decision making for targeted promotions. Indeed, having an effective targeted promotions approach may be seen as an outcome of the “holy grail”, as it will allow retailers to promote the right item, to the right customer, using the right incentives to drive up revenue, profitability, and customer share, whilst minimising costs. Given this, the key and original contribution of this study is the development of the market target (mt) model, the clustering approach, and the computer-based algorithm to enhance targeted promotions. Tests conducted on large scale consumer panel data, with over 32000 customers and 51 million individual scanned items per year, show that the mt model and the clustering approach successfully identifies both the best items, and customers to target. Further, the algorithm segregates customers into differing categories of loyalty, in this case it is four, to enable retailers to offer customised incentives schemes to each group, thereby enhancing customer engagement, whilst preventing unnecessary revenue erosion. The proposed model is compared with both a recently published approach, and the cross-sectional shopping patterns of the customers on the consumer scanner panel. Tests show that the proposed approach outperforms the other approach in that it significantly reduces the probability of having “false negatives” and “false positives” in the target customer set. Tests also show that the customer segmentation approach is effective, in that customers who are classed as highly loyal to a grocery retailer, are indeed loyal, whilst those that are classified as “switchers” do indeed have low levels of loyalty to the selected grocery retailer. Applying the mt model to other fields has not only been novel but yielded success. School attendance is improved with the aid of the mt model being applied to attendance data. In this regard, an action research study, involving the proposed mt model and approach, conducted at a local UK primary school, has resulted in the school now meeting the required attendance targets set by the government, and it has halved its persistent absenteeism for the first time in four years. In medicine, the mt model is seen as a useful tool that could rapidly uncover associations that may lead to new research hypotheses, whilst in crime prevention, the mt value may be used as an effective, tangible, efficiency metric that will lead to enhanced crime prevention outcomes, and support stronger community engagement. Future work includes the development of a software program for improving school attendance that will be offered to all schools, while further progress will be made on demonstrating the effectiveness of the mt value as a tangible crime prevention metric

    Benefits of the application of web-mining methods and techniques for the field of analytical customer relationship management of the marketing function in a knowledge management perspective

    Get PDF
    Le Web Mining (WM) reste une technologie relativement méconnue. Toutefois, si elle est utilisée adéquatement, elle s'avère être d'une grande utilité pour l'identification des profils et des comportements des clients prospects et existants, dans un contexte internet. Les avancées techniques du WM améliorent grandement le volet analytique de la Gestion de la Relation Client (GRC). Cette étude suit une approche exploratoire afin de déterminer si le WM atteint, à lui seul, tous les objectifs fondamentaux de la GRC, ou le cas échéant, devrait être utilisé de manière conjointe avec la recherche marketing traditionnelle et les méthodes classiques de la GRC analytique (GRCa) pour optimiser la GRC, et de fait le marketing, dans un contexte internet. La connaissance obtenue par le WM peut ensuite être administrée au sein de l'organisation dans un cadre de Gestion de la Connaissance (GC), afin d'optimiser les relations avec les clients nouveaux et/ou existants, améliorer leur expérience client et ultimement, leur fournir de la meilleure valeur. Dans un cadre de recherche exploratoire, des entrevues semi-structurés et en profondeur furent menées afin d'obtenir le point de vue de plusieurs experts en (web) data rnining. L'étude révéla que le WM est bien approprié pour segmenter les clients prospects et existants, pour comprendre les comportements transactionnels en ligne des clients existants et prospects, ainsi que pour déterminer le statut de loyauté (ou de défection) des clients existants. Il constitue, à ce titre, un outil d'une redoutable efficacité prédictive par le biais de la classification et de l'estimation, mais aussi descriptive par le biais de la segmentation et de l'association. En revanche, le WM est moins performant dans la compréhension des dimensions sous-jacentes, moins évidentes du comportement client. L'utilisation du WM est moins appropriée pour remplir des objectifs liés à la description de la manière dont les clients existants ou prospects développent loyauté, satisfaction, défection ou attachement envers une enseigne sur internet. Cet exercice est d'autant plus difficile que la communication multicanale dans laquelle évoluent les consommateurs a une forte influence sur les relations qu'ils développent avec une marque. Ainsi le comportement en ligne ne serait qu'une transposition ou tout du moins une extension du comportement du consommateur lorsqu'il n'est pas en ligne. Le WM est également un outil relativement incomplet pour identifier le développement de la défection vers et depuis les concurrents ainsi que le développement de la loyauté envers ces derniers. Le WM nécessite toujours d'être complété par la recherche marketing traditionnelle afin d'atteindre ces objectives plus difficiles mais essentiels de la GRCa. Finalement, les conclusions de cette recherche sont principalement dirigées à l'encontre des firmes et des gestionnaires plus que du côté des clients-internautes, car ces premiers plus que ces derniers possèdent les ressources et les processus pour mettre en œuvre les projets de recherche en WM décrits.\ud ______________________________________________________________________________ \ud MOTS-CLÉS DE L’AUTEUR : Web mining, Gestion de la connaissance, Gestion de la relation client, Données internet, Comportement du consommateur, Forage de données, Connaissance du consommateu

    MFIS—Mining Frequent Itemsets on Data Streams

    No full text
    corecore