260 research outputs found
Re-mining positive and negative association mining results
Positive and negative association mining are well-known and extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where
price markdown is an essential tool for promoting sales and generating increased revenue
Graph based Anomaly Detection and Description: A Survey
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the âwhyâ, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field
Combining data mining and evolutionary computation for multi-criteria optimization of earthworks
Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design
Efficient Large Scale Clustering based on Data Partitioning
3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2016), Montreal, Canada, 17-19 October, 2016Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases; generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic; no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.Science Foundation Irelan
Motifs séquentiels pour la description de séries temporelles d'images satellitaires et la prévision d'événements
Les travaux prĂ©sentĂ©s concernent lâextraction de connaissances dans les donnĂ©es Ă des ïŹns de description et dâinfĂ©rence. Comment dĂ©crire des SĂ©ries Temporelles dâImages Satellitaire (STIS) en mode non supervisĂ© ? Comment prĂ©voir des Ă©vĂ©nements tels que des pannes dans des systĂšmes complexes ? Des rĂ©ponses originales sâappuyant sur des techniques de fouille de donnĂ©es extrayant des motifs locaux, les motifs sĂ©quentiels, sont dĂ©veloppĂ©es. Ainsi, de nouveaux motifs, les motifs SĂ©quentiels FrĂ©quents GroupĂ©s (motifs SFG), sont-ils proposĂ©s aïŹn dâextraire dâune STIS des groupes de pixels faisant sens spatialement et temporellement. Une technique originale permettant de pousser les contraintes associĂ©es Ă ces motifs au sein du processus dâextraction est Ă©galement dĂ©taillĂ©e. Des expĂ©riences sur des donnĂ©es optiques et radar, Ă des rĂ©solutions diïŹĂ©rentes, conïŹrment leur potentiel. Un classement de ces motifs basĂ© sur lâinformation mutuelle et la swap-randomization est par ailleurs proposĂ© aïŹn de mettre en avant les motifs ayant peu de chances dâapparaĂźtre dans un jeu de donnĂ©es alĂ©atoires oĂč les frĂ©quences sont conservĂ©es, exprimant des changements et progressant dans lâespace. Quant Ă la prĂ©vision dâĂ©vĂ©nements, une approche de type leave-one-out est proposĂ©e pour sĂ©lectionner des motifs sĂ©quentiels, les FLM-rĂšgles, gĂ©nĂ©riques et dĂ©clenchant le moins possible de fausses alarmes. Une mĂ©thode de prĂ©vision au plus tĂŽt tirant parti de ces motifs est Ă©galement avancĂ©e et validĂ©e sur des donnĂ©es rĂ©elles provenant de systĂšmes mĂ©caniques complexes. Les expĂ©riences menĂ©es montrent quâil est possible de prĂ©voir des dĂ©faillances pour lesquelles lâexpertise technique est insuïŹsante. Cette mĂ©thode de prĂ©vision est aujourdâhui brevetĂ©e
- âŠ