260 research outputs found

    Re-mining positive and negative association mining results

    Get PDF
    Positive and negative association mining are well-known and extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Combining data mining and evolutionary computation for multi-criteria optimization of earthworks

    Get PDF
    Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design

    Efficient Large Scale Clustering based on Data Partitioning

    Get PDF
    3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2016), Montreal, Canada, 17-19 October, 2016Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases; generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic; no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.Science Foundation Irelan

    Motifs séquentiels pour la description de séries temporelles d'images satellitaires et la prévision d'événements

    Get PDF
    Les travaux prĂ©sentĂ©s concernent l’extraction de connaissances dans les donnĂ©es Ă  des ïŹns de description et d’infĂ©rence. Comment dĂ©crire des SĂ©ries Temporelles d’Images Satellitaire (STIS) en mode non supervisĂ© ? Comment prĂ©voir des Ă©vĂ©nements tels que des pannes dans des systĂšmes complexes ? Des rĂ©ponses originales s’appuyant sur des techniques de fouille de donnĂ©es extrayant des motifs locaux, les motifs sĂ©quentiels, sont dĂ©veloppĂ©es. Ainsi, de nouveaux motifs, les motifs SĂ©quentiels FrĂ©quents GroupĂ©s (motifs SFG), sont-ils proposĂ©s aïŹn d’extraire d’une STIS des groupes de pixels faisant sens spatialement et temporellement. Une technique originale permettant de pousser les contraintes associĂ©es Ă  ces motifs au sein du processus d’extraction est Ă©galement dĂ©taillĂ©e. Des expĂ©riences sur des donnĂ©es optiques et radar, Ă  des rĂ©solutions diïŹ€Ă©rentes, conïŹrment leur potentiel. Un classement de ces motifs basĂ© sur l’information mutuelle et la swap-randomization est par ailleurs proposĂ© aïŹn de mettre en avant les motifs ayant peu de chances d’apparaĂźtre dans un jeu de donnĂ©es alĂ©atoires oĂč les frĂ©quences sont conservĂ©es, exprimant des changements et progressant dans l’espace. Quant Ă  la prĂ©vision d’évĂ©nements, une approche de type leave-one-out est proposĂ©e pour sĂ©lectionner des motifs sĂ©quentiels, les FLM-rĂšgles, gĂ©nĂ©riques et dĂ©clenchant le moins possible de fausses alarmes. Une mĂ©thode de prĂ©vision au plus tĂŽt tirant parti de ces motifs est Ă©galement avancĂ©e et validĂ©e sur des donnĂ©es rĂ©elles provenant de systĂšmes mĂ©caniques complexes. Les expĂ©riences menĂ©es montrent qu’il est possible de prĂ©voir des dĂ©faillances pour lesquelles l’expertise technique est insuïŹƒsante. Cette mĂ©thode de prĂ©vision est aujourd’hui brevetĂ©e
    • 

    corecore