7,114 research outputs found
New probabilistic interest measures for association rules
Mining association rules is an important technique for discovering meaningful
patterns in transaction databases. Many different measures of interestingness
have been proposed for association rules. However, these measures fail to take
the probabilistic properties of the mined data into account. In this paper, we
start with presenting a simple probabilistic framework for transaction data
which can be used to simulate transaction data when no associations are
present. We use such data and a real-world database from a grocery outlet to
explore the behavior of confidence and lift, two popular interest measures used
for rule mining. The results show that confidence is systematically influenced
by the frequency of the items in the left hand side of rules and that lift
performs poorly to filter random noise in transaction data. Based on the
probabilistic framework we develop two new interest measures, hyper-lift and
hyper-confidence, which can be used to filter or order mined association rules.
The new measures show significantly better performance than lift for
applications where spurious rules are problematic
A Model-Based Frequency Constraint for Mining Associations from Transaction Data
Mining frequent itemsets is a popular method for finding associated items in
databases. For this method, support, the co-occurrence frequency of the items
which form an association, is used as the primary indicator of the
associations's significance. A single user-specified support threshold is used
to decided if associations should be further investigated. Support has some
known problems with rare items, favors shorter itemsets and sometimes produces
misleading associations.
In this paper we develop a novel model-based frequency constraint as an
alternative to a single, user-specified minimum support. The constraint
utilizes knowledge of the process generating transaction data by applying a
simple stochastic mixture model (the NB model) which allows for transaction
data's typically highly skewed item frequency distribution. A user-specified
precision threshold is used together with the model to find local frequency
thresholds for groups of itemsets. Based on the constraint we develop the
notion of NB-frequent itemsets and adapt a mining algorithm to find all
NB-frequent itemsets in a database. In experiments with publicly available
transaction databases we show that the new constraint provides improvements
over a single minimum support threshold and that the precision threshold is
more robust and easier to set and interpret by the user
A Product Affinity Segmentation Framework
Product affinity segmentation discovers the linking between customers and products for cross-selling and promotion opportunities to increase sales and profits. However, there are some challenges with conventional approaches. The most straightforward approach is to use the product-level data for customer segmentation, but it results in less meaningful solutions. Moreover, customer segmentation becomes challenging on massive datasets due to computational complexity of traditional clustering methods. As an alternative, market basket analysis may suffer from association rules too general to be relevant for important segments. In this paper, we propose to partition customers and discover associated products simultaneously by detecting communities in the customer-product bipartite graph using the Louvain algorithm that has good interpretability in this context. Through the post-clustering analysis, we show that this framework generates statistically distinct clusters and identifies associated products relevant for each cluster. Our analysis provides greater insights into customer purchase behaviors, potentially helping personalization strategic planning (e.g. customized product recommendation) and profitability increase. And our case study of a large U.S. retailer provides useful management insights. Moreover, the graph application, based on almost 800,000 sales transactions, finished in 7.5 seconds on a standard PC, demonstrating its computational efficiency and better facilitating the requirements of big data
The structure of Inter-Urban traffic: A weighted network analysis
We study the structure of the network representing the interurban commuting
traffic of the Sardinia region, Italy, which amounts to 375 municipalities and
1,600,000 inhabitants. We use a weighted network representation where vertices
correspond to towns and the edges to the actual commuting flows among those. We
characterize quantitatively both the topological and weighted properties of the
resulting network. Interestingly, the statistical properties of commuting
traffic exhibit complex features and non-trivial relations with the underlying
topology. We characterize quantitatively the traffic backbone among large
cities and we give evidences for a very high heterogeneity of the commuter
flows around large cities. We also discuss the interplay between the
topological and dynamical properties of the network as well as their relation
with socio-demographic variables such as population and monthly income. This
analysis may be useful at various stages in environmental planning and provides
analytical tools for a wide spectrum of applications ranging from impact
evaluation to decision-making and planning support.Comment: 12 pages, 12 figures, 4 tables; 1 missing ref added and minor
revision
Complex Politics: A Quantitative Semantic and Topological Analysis of UK House of Commons Debates
This study is a first, exploratory attempt to use quantitative semantics
techniques and topological analysis to analyze systemic patterns arising in a
complex political system. In particular, we use a rich data set covering all
speeches and debates in the UK House of Commons between 1975 and 2014. By the
use of dynamic topic modeling (DTM) and topological data analysis (TDA) we show
that both members and parties feature specific roles within the system,
consistent over time, and extract global patterns indicating levels of
political cohesion. Our results provide a wide array of novel hypotheses about
the complex dynamics of political systems, with valuable policy applications
Review and Analysis of Pain Research Literature through Keyword Co-occurrence Networks
Pain is a significant public health problem as the number of individuals with
a history of pain globally keeps growing. In response, many synergistic
research areas have been coming together to address pain-related issues. This
work conducts a review and analysis of a vast body of pain-related literature
using the keyword co-occurrence network (KCN) methodology. In this method, a
set of KCNs is constructed by treating keywords as nodes and the co-occurrence
of keywords as links between the nodes. Since keywords represent the knowledge
components of research articles, analysis of KCNs will reveal the knowledge
structure and research trends in the literature. This study extracted and
analyzed keywords from 264,560 pain-related research articles indexed in IEEE,
PubMed, Engineering Village, and Web of Science published between 2002 and
2021. We observed rapid growth in pain literature in the last two decades: the
number of articles has grown nearly threefold, and the number of keywords has
grown by a factor of 7. We identified emerging and declining research trends in
sensors/methods, biomedical, and treatment tracks. We also extracted the most
frequently co-occurring keyword pairs and clusters to help researchers
recognize the synergies among different pain-related topics
- …