Search CORE

260 research outputs found

International Evaluation of Research and Doctoral Training at the University of Helsinki 2005-2010 : RC-Specific Evaluation of ALKO - Algorithms and Data Analysis

Author
Publication venue
Publication date: 01/01/2012
Field of study

Helsingin yliopiston digitaalinen arkisto

Re-mining positive and negative association mining results

Author: Atan Tankut
Demiriz Ayhan
Ertek Gurdal
Ertek Gürdal
Kula Ufuk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Positive and negative association mining are well-known and extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue

Isik University Academic Open Access

Sabanci University Research Database

Graph based Anomaly Detection and Description: A Survey

Author: Danai Koutra
Hanghang Tong
Leman Akoglu
Publication venue
Publication date: 28/04/2014
Field of study

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

arXiv.org e-Print Archive

CiteSeerX

Combining data mining and evolutionary computation for multi-criteria optimization of earthworks

Author: Correia A. Gomes
Cortez Paulo
Parente Manuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design

Universidade do Minho: RepositoriUM

Crossref

Efficient Large Scale Clustering based on Data Partitioning

Author: Bendechache Malika
Kechadi Tahar
Le-Khac Nhien-An
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/10/2016
Field of study

3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2016), Montreal, Canada, 17-19 October, 2016Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases; generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic; no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.Science Foundation Irelan

Research Repository UCD

Irish Universities

Motifs séquentiels pour la description de séries temporelles d'images satellitaires et la prévision d'événements

Author: Méger Nicolas
Publication venue: HAL CCSD
Publication date: 29/03/2013
Field of study

Les travaux présentés concernent l’extraction de connaissances dans les données à des ﬁns de description et d’inférence. Comment décrire des Séries Temporelles d’Images Satellitaire (STIS) en mode non supervisé ? Comment prévoir des événements tels que des pannes dans des systèmes complexes ? Des réponses originales s’appuyant sur des techniques de fouille de données extrayant des motifs locaux, les motifs séquentiels, sont développées. Ainsi, de nouveaux motifs, les motifs Séquentiels Fréquents Groupés (motifs SFG), sont-ils proposés aﬁn d’extraire d’une STIS des groupes de pixels faisant sens spatialement et temporellement. Une technique originale permettant de pousser les contraintes associées à ces motifs au sein du processus d’extraction est également détaillée. Des expériences sur des données optiques et radar, à des résolutions diﬀérentes, conﬁrment leur potentiel. Un classement de ces motifs basé sur l’information mutuelle et la swap-randomization est par ailleurs proposé aﬁn de mettre en avant les motifs ayant peu de chances d’apparaître dans un jeu de données aléatoires où les fréquences sont conservées, exprimant des changements et progressant dans l’espace. Quant à la prévision d’événements, une approche de type leave-one-out est proposée pour sélectionner des motifs séquentiels, les FLM-règles, génériques et déclenchant le moins possible de fausses alarmes. Une méthode de prévision au plus tôt tirant parti de ces motifs est également avancée et validée sur des données réelles provenant de systèmes mécaniques complexes. Les expériences menées montrent qu’il est possible de prévoir des défaillances pour lesquelles l’expertise technique est insuﬃsante. Cette méthode de prévision est aujourd’hui brevetée

Thèses en Ligne

Hal - Université Grenoble Alpes

HAL Université de Savoie

Research Self-Evaluation 2003-2008, Computer Science Department, University of Twente.

Author: Aksit Mehmet
Apers Peter M.G.
Hartel Pieter H.
Haverkort Boudewijn R.H.M.
Havinga Paul J.M.
Nijholt Antinus
Pras Aiko
Rensink Arend
van de Pol Jan Cornelis
van Sinderen Marten J.
Wieringa Roelf J.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2009
Field of study

University of Twente Research Information