9 research outputs found
Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model
AbstractFrequent itemset mining from data streams is an important data mining problem with broad applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. However, it is also a difficult problem due to the unbounded, high-speed and continuous characteristics of streaming data. Therefore, extracting frequent itemsets from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient algorithm, called Max-FISM (Maximal-Frequent Itemsets Mining), for mining recent maximal frequent itemsets from a high-speed stream of transactions within a sliding window. According to our algorithm, whenever a new transaction is inserted in the current window only its maximum itemset should be inserted into a prefix tree-based summary data structure called Max-Set for maintaining the number of independent appearance of each transaction in the current window. Finally, the set of recent maximal frequent itemsets is obtained from the current Max-Set. Experimental studies show that the proposed Max-FISM algorithm is highly efficient in terms of memory and time complexity for mining recent maximal frequent itemsets over high-speed data streams
Distributed Frequent Item Sets Mining over P2P Networks
Data intensive peer-to-peer (P2P) networks are becoming increasingly popular in applications like social networking, file sharing networks, etc. Data mining in such P2P environments is the new generation of advanced P2P applications. Unfortunately, most of the existing data mining algorithms do not fit well in such environments since they require data that can be accessed in its entirety. It also is not easy due to the requirements of online transactional data streams. In this paper, we have developed a local algorithm for tracing frequent item sets over a P2P network. The performance of the proposed algorithm is comparatively tested and analyzed through a series of experiments
Collaborative Planning and Event Monitoring Over Supply Chain Network
The shifting paradigm of supply chain management is manifesting increasing reliance on automated collaborative planning and event monitoring through information-bounded interaction across organizations. An end-to-end support for the course of actions is turning vital in faster incident response and proactive decision making. Many current platforms exhibit limitations to handle supply chain planning and monitoring in decentralized setting where participants may divide their responsibilities and share computational load of the solution generation. In this thesis, we investigate modeling and solution generation techniques for shared commodity delivery planning and event monitoring problems in a collaborative setting. In particular, we first elaborate a new model of Multi-Depot Vehicle Routing Problem (MDVRP) to jointly serve customer demands using multiple vehicles followed by a heuristic technique to search near-optimal solutions for such problem instances. Secondly, we propose two distributed mechanisms, namely: Passive Learning and Active Negotiation, to find near-optimal MDVRP solutions while executing the heuristic algorithm at the participant's side. Thirdly, we illustrate a collaboration mechanism to cost-effectively deploy execution monitors over supply chain network in order to collect in-field plan execution data. Finally, we describe a distributed approach to collaboratively monitor associations among recent events from an incoming stream of plan execution data. Experimental results over known datasets demonstrate the efficiency of the approaches to handle medium and large problem instances. The work has also produced considerable knowledge on the collaborative transportation planning and execution event monitoring
Mining Meaning from Text by Harvesting Frequent and Diverse Semantic Itemsets
Abstract. In this paper, we present a novel and completely-unsupervised approach to unravel meanings (or senses) from linguistic constructions found in large corpora by introducing the concept of semantic vector. A semantic vector is a space-transformed vector where features repre-sent fine-grained semantic information units, instead of values of co-occurrences within a collection of texts. More in detail, instead of seeing words as vectors of frequency values, we propose to first explode words into a multitude of tiny semantic information retrieved from existing re-sources like WordNet and ConceptNet, and then clustering them into frequent and diverse patterns. This way, on the one hand, we are able to model linguistic data with a larger but much more dense and informa-tive semantic feature space. On the other hand, being the model based on basic and conceptual information, we are also able to generate new data by querying the above-mentioned semantic resources with the fea-tures contained in the extracted patterns. We experimented the idea on a dataset of 640 millions of triples subject-verb-object to automatically inducing senses for specific input verbs, demonstrating the validity and the potential of the presented approach in modeling and understanding natural language
Modelling Web Usage in a Changing Environment
Eiben, A.E. [Promotor]Kowalczyk, W. [Copromotor
Enhancing the Prediction of Missing Targeted Items from the Transactions of Frequent, Known Users
The ability for individual grocery retailers to have a single view of its customers across
all of their grocery purchases remains elusive, and is considered the “holy grail” of
grocery retailing. This has become increasingly important in recent years, especially
in the UK, where competition has intensified, shopping habits and demographics have
changed, and price sensitivity has increased. Whilst numerous studies have been conducted
on understanding independent items that are frequently bought together, there
has been little research conducted on using this knowledge of frequent itemsets to support
decision making for targeted promotions. Indeed, having an effective targeted
promotions approach may be seen as an outcome of the “holy grail”, as it will allow
retailers to promote the right item, to the right customer, using the right incentives
to drive up revenue, profitability, and customer share, whilst minimising costs.
Given this, the key and original contribution of this study is the development of the
market target (mt) model, the clustering approach, and the computer-based algorithm
to enhance targeted promotions. Tests conducted on large scale consumer panel data,
with over 32000 customers and 51 million individual scanned items per year, show
that the mt model and the clustering approach successfully identifies both the best
items, and customers to target. Further, the algorithm segregates customers into
differing categories of loyalty, in this case it is four, to enable retailers to offer customised
incentives schemes to each group, thereby enhancing customer engagement, whilst preventing
unnecessary revenue erosion.
The proposed model is compared with both a recently published approach, and the cross-sectional
shopping patterns of the customers on the consumer scanner panel. Tests show that the proposed
approach outperforms the other approach in that it significantly reduces the probability of
having “false negatives” and “false positives” in the target customer set. Tests also
show that the customer segmentation approach is effective, in that customers who are
classed as highly loyal to a grocery retailer, are indeed loyal, whilst those that are classified
as “switchers” do indeed have low levels of loyalty to the selected grocery retailer.
Applying the mt model to other fields has not only been novel but yielded success.
School attendance is improved with the aid of the mt model being applied to attendance
data. In this regard, an action research study, involving the proposed mt
model and approach, conducted at a local UK primary school, has resulted in the
school now meeting the required attendance targets set by the government, and it has
halved its persistent absenteeism for the first time in four years. In medicine, the mt
model is seen as a useful tool that could rapidly uncover associations that may lead
to new research hypotheses, whilst in crime prevention, the mt value may be used as
an effective, tangible, efficiency metric that will lead to enhanced crime prevention
outcomes, and support stronger community engagement.
Future work includes the development of a software program for improving school
attendance that will be offered to all schools, while further progress will be made on
demonstrating the effectiveness of the mt value as a tangible crime prevention metric
Benefits of the application of web-mining methods and techniques for the field of analytical customer relationship management of the marketing function in a knowledge management perspective
Le Web Mining (WM) reste une technologie relativement méconnue. Toutefois, si elle est utilisée adéquatement, elle s'avère être d'une grande utilité pour l'identification des profils et des comportements des clients prospects et existants, dans un contexte internet. Les avancées techniques du WM améliorent grandement le volet analytique de la Gestion de la Relation Client (GRC). Cette étude suit une approche exploratoire afin de déterminer si le WM atteint, à lui seul, tous les objectifs fondamentaux de la GRC, ou le cas échéant, devrait être utilisé de manière conjointe avec la recherche marketing traditionnelle et les méthodes classiques de la GRC analytique (GRCa) pour optimiser la GRC, et de fait le marketing, dans un contexte internet. La connaissance obtenue par le WM peut ensuite être administrée au sein de l'organisation dans un cadre de Gestion de la Connaissance (GC), afin d'optimiser les relations avec les clients nouveaux et/ou existants, améliorer leur expérience client et ultimement, leur fournir de la meilleure valeur. Dans un cadre de recherche exploratoire, des entrevues semi-structurés et en profondeur furent menées afin d'obtenir le point de vue de plusieurs experts en (web) data rnining. L'étude révéla que le WM est bien approprié pour segmenter les clients prospects et existants, pour comprendre les comportements transactionnels en ligne des clients existants et prospects, ainsi que pour déterminer le statut de loyauté (ou de défection) des clients existants. Il constitue, à ce titre, un outil d'une redoutable efficacité prédictive par le biais de la classification et de l'estimation, mais aussi descriptive par le biais de la segmentation et de l'association. En revanche, le WM est moins performant dans la compréhension des dimensions sous-jacentes, moins évidentes du comportement client. L'utilisation du WM est moins appropriée pour remplir des objectifs liés à la description de la manière dont les clients existants ou prospects développent loyauté, satisfaction, défection ou attachement envers une enseigne sur internet. Cet exercice est d'autant plus difficile que la communication multicanale dans laquelle évoluent les consommateurs a une forte influence sur les relations qu'ils développent avec une marque. Ainsi le comportement en ligne ne serait qu'une transposition ou tout du moins une extension du comportement du consommateur lorsqu'il n'est pas en ligne. Le WM est également un outil relativement incomplet pour identifier le développement de la défection vers et depuis les concurrents ainsi que le développement de la loyauté envers ces derniers. Le WM nécessite toujours d'être complété par la recherche marketing traditionnelle afin d'atteindre ces objectives plus difficiles mais essentiels de la GRCa. Finalement, les conclusions de cette recherche sont principalement dirigées à l'encontre des firmes et des gestionnaires plus que du côté des clients-internautes, car ces premiers plus que ces derniers possèdent les ressources et les processus pour mettre en œuvre les projets de recherche en WM décrits.\ud
______________________________________________________________________________ \ud
MOTS-CLÉS DE L’AUTEUR : Web mining, Gestion de la connaissance, Gestion de la relation client, Données internet, Comportement du consommateur, Forage de données, Connaissance du consommateu