7,114 research outputs found

    New probabilistic interest measures for association rules

    Full text link
    Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic

    A Model-Based Frequency Constraint for Mining Associations from Transaction Data

    Full text link
    Mining frequent itemsets is a popular method for finding associated items in databases. For this method, support, the co-occurrence frequency of the items which form an association, is used as the primary indicator of the associations's significance. A single user-specified support threshold is used to decided if associations should be further investigated. Support has some known problems with rare items, favors shorter itemsets and sometimes produces misleading associations. In this paper we develop a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) which allows for transaction data's typically highly skewed item frequency distribution. A user-specified precision threshold is used together with the model to find local frequency thresholds for groups of itemsets. Based on the constraint we develop the notion of NB-frequent itemsets and adapt a mining algorithm to find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint provides improvements over a single minimum support threshold and that the precision threshold is more robust and easier to set and interpret by the user

    A Product Affinity Segmentation Framework

    Get PDF
    Product affinity segmentation discovers the linking between customers and products for cross-selling and promotion opportunities to increase sales and profits. However, there are some challenges with conventional approaches. The most straightforward approach is to use the product-level data for customer segmentation, but it results in less meaningful solutions. Moreover, customer segmentation becomes challenging on massive datasets due to computational complexity of traditional clustering methods. As an alternative, market basket analysis may suffer from association rules too general to be relevant for important segments. In this paper, we propose to partition customers and discover associated products simultaneously by detecting communities in the customer-product bipartite graph using the Louvain algorithm that has good interpretability in this context. Through the post-clustering analysis, we show that this framework generates statistically distinct clusters and identifies associated products relevant for each cluster. Our analysis provides greater insights into customer purchase behaviors, potentially helping personalization strategic planning (e.g. customized product recommendation) and profitability increase. And our case study of a large U.S. retailer provides useful management insights. Moreover, the graph application, based on almost 800,000 sales transactions, finished in 7.5 seconds on a standard PC, demonstrating its computational efficiency and better facilitating the requirements of big data

    The structure of Inter-Urban traffic: A weighted network analysis

    Full text link
    We study the structure of the network representing the interurban commuting traffic of the Sardinia region, Italy, which amounts to 375 municipalities and 1,600,000 inhabitants. We use a weighted network representation where vertices correspond to towns and the edges to the actual commuting flows among those. We characterize quantitatively both the topological and weighted properties of the resulting network. Interestingly, the statistical properties of commuting traffic exhibit complex features and non-trivial relations with the underlying topology. We characterize quantitatively the traffic backbone among large cities and we give evidences for a very high heterogeneity of the commuter flows around large cities. We also discuss the interplay between the topological and dynamical properties of the network as well as their relation with socio-demographic variables such as population and monthly income. This analysis may be useful at various stages in environmental planning and provides analytical tools for a wide spectrum of applications ranging from impact evaluation to decision-making and planning support.Comment: 12 pages, 12 figures, 4 tables; 1 missing ref added and minor revision

    Complex Politics: A Quantitative Semantic and Topological Analysis of UK House of Commons Debates

    Get PDF
    This study is a first, exploratory attempt to use quantitative semantics techniques and topological analysis to analyze systemic patterns arising in a complex political system. In particular, we use a rich data set covering all speeches and debates in the UK House of Commons between 1975 and 2014. By the use of dynamic topic modeling (DTM) and topological data analysis (TDA) we show that both members and parties feature specific roles within the system, consistent over time, and extract global patterns indicating levels of political cohesion. Our results provide a wide array of novel hypotheses about the complex dynamics of political systems, with valuable policy applications

    Review and Analysis of Pain Research Literature through Keyword Co-occurrence Networks

    Full text link
    Pain is a significant public health problem as the number of individuals with a history of pain globally keeps growing. In response, many synergistic research areas have been coming together to address pain-related issues. This work conducts a review and analysis of a vast body of pain-related literature using the keyword co-occurrence network (KCN) methodology. In this method, a set of KCNs is constructed by treating keywords as nodes and the co-occurrence of keywords as links between the nodes. Since keywords represent the knowledge components of research articles, analysis of KCNs will reveal the knowledge structure and research trends in the literature. This study extracted and analyzed keywords from 264,560 pain-related research articles indexed in IEEE, PubMed, Engineering Village, and Web of Science published between 2002 and 2021. We observed rapid growth in pain literature in the last two decades: the number of articles has grown nearly threefold, and the number of keywords has grown by a factor of 7. We identified emerging and declining research trends in sensors/methods, biomedical, and treatment tracks. We also extracted the most frequently co-occurring keyword pairs and clusters to help researchers recognize the synergies among different pain-related topics

    Language Grounding in Massive Online Data

    Get PDF
    • …
    corecore