9 research outputs found

    An investigation into the issues of multi-agent data mining

    Get PDF
    Multi-agent systems (MAS) often deal with complex applications that require distributedproblem solving. In many applications the individual and collective behaviourof the agents depends on the observed data from distributed sources. The field of DistributedData Mining (DDM) deals with these challenges in analyzing distributed dataand offers many algorithmic solutions to perform different data analysis and miningoperations in a fundamentally distributed manner that pays careful attention to the resourceconstraints. Since multi-agent systems are often distributed and agents haveproactive and reactive features, combining DM with MAS for data intensive applicationsis therefore appealing.This Chapter discusses a number of research issues concerned with the use ofMulti-Agent Systems for Data Mining (MADM), also known as agent-driven datamining. The Chapter also examines the issues affecting the design and implementationof a generic and extendible agent-based data mining framework. An ExtendibleMulti-Agent Data mining System (EMADS) Framework for integrating distributeddata sources is presented. This framework achieves high-availability and highperformance without compromising the data integrity and security. © 2010 Nova Science Publishers, Inc. All rights reserved

    Hybrid Algorithm Selection and Hyperparameter Tuning on Distributed Machine Learning Resources: A Hierarchical Agent-based Approach

    Full text link
    Algorithm selection and hyperparameter tuning are critical steps in both academic and applied machine learning. On the other hand, these steps are becoming ever increasingly delicate due to the extensive rise in the number, diversity, and distributedness of machine learning resources. Multi-agent systems, when applied to the design of machine learning platforms, bring about several distinctive characteristics such as scalability, flexibility, and robustness, just to name a few. This paper proposes a fully automatic and collaborative agent-based mechanism for selecting distributedly organized machine learning algorithms and simultaneously tuning their hyperparameters. Our method builds upon an existing agent-based hierarchical machine-learning platform and augments its query structure to support the aforementioned functionalities without being limited to specific learning, selection, and tuning mechanisms. We have conducted theoretical assessments, formal verification, and analytical study to demonstrate the correctness, resource utilization, and computational efficiency of our technique. According to the results, our solution is totally correct and exhibits linear time and space complexity in relation to the size of available resources. To provide concrete examples of how the proposed methodologies can effectively adapt and perform across a range of algorithmic options and datasets, we have also conducted a series of experiments using a system comprised of 24 algorithms and 9 datasets

    Multi-agent data mining with negotiation: a study in multi-agent based clustering

    Get PDF
    Multi-Agent Data Mining (MADM) seeks to harness the general advantages offered by Multi-Agent System (MAS) with respect to the domain of data mining. The research described in this thesis is concerned with Multi-Agent Based Clustering (MABC), thus MADM to support clustering. To investigate the use of MAS technology with respect to data mining, and specifically data clustering, two approaches are proposed in this thesis. The first approach is a multi-agent based approach to clustering using a generic MADM framework whereby a collection of agents with different capabilities are allowed to collaborate to produce a ``best'' set of clusters. The framework supports three clustering paradigms: K-means, K-NN and divisive hierarchical clustering. A number of experiments were conducted using benchmark UCI data sets and designed to demonstrate that the proposed MADM approach can identify a best set of clusters using the following clustering metrics: F-measure, Within Group Average Distance (WGAD) and Between Group Average Distance (BGAD). The results demonstrated that the MADM framework could successfully be used to find a best cluster configuration. The second approach is an extension of the proposed initial MADM framework whereby a ``best'' cluster configuration could be found using cooperation and negotiation among agents. The novel feature of the extended framework is that it adopts a two-phase approach to clustering. Phase one is similar to the established centralised clustering approach (except that it is conducted in a decentralised manner). Phase two comprises a negotiation phase where agents ``swap'' unwanted records so as to improve a cluster configuration. A set of performatives is proposed as part of a negotiation protocol to facilitate intra-agent negotiation. It is this negotiation capability which is the central contribution of the work described in this thesis. An extensive evaluation of the extended framework was conducted using: (i) benchmark UCI data sets and (ii) a welfare benefits data set that provides an exemplar application. Evaluation of the framework clearly demonstrates that, in the majority of cases, this negotiation phase serves to produce a better cluster configuration (in terms of cohesion and separation) than that produced using a simple centralised approach

    Privacy-preserving distributed data mining

    Get PDF
    This thesis is concerned with privacy-preserving distributed data mining algorithms. The main challenges in this setting are inference attacks and the formation of collusion groups. The inference problem is the reconstruction of sensitive data by attackers from non-sensitive sources, such as intermediate results, exchanged messages, or public information. Moreover, in a distributed scenario, malicious insiders can organize collusion groups to deploy more effective inference attacks. This thesis shows that existing privacy measures do not adequately protect privacy against inference and collusion. Therefore, in this thesis, new measures based on information theory are developed to overcome the identiffied limitations. Furthermore, a new distributed data clustering algorithm is presented. The clustering approach is based on a kernel density estimates approximation that generates a controlled amount of ambiguity in the density estimates and provides privacy to original data. Besides, this thesis also introduces the first privacy-preserving algorithms for frequent pattern discovery in a distributed time series. Time series are transformed into a set of n-dimensional data points and finding frequent patterns reduced to finding local maxima in the n-dimensional density space. The proposed algorithms are linear in the size of the dataset with low communication costs, validated by experimental evaluation using different datasets.Diese Arbeit befasst sich mit vertraulichkeitsbewahrendem Data Mining in verteilten Umgebungen mit Schwerpunkt auf ausgewählten N-Agenten-Angriffsszenarien für das Inferenzproblem im Data-Clustering und der Zeitreihenanalyse. Dabei handelt es sich um Angriffe von einzelnen oder Teilgruppen von Agenten innerhalb einer verteilten Data Mining-Gruppe oder von einem einzelnen Agenten außerhalb dieser Gruppe. Zunächst werden in dieser Arbeit zwei neue Privacy-Maße vorgestellt, die im Gegensatz zu bislang existierenden, die im verteilten Data Mining allgemein geforderte Eigenschaften zur Vertraulichkeitsbewahrung erfüllen und bei denen sich der gemessene Grad der Vertraulichkeit auf die verwendete Datenanalysemethode und die Anzahl von Angreifern bezieht. Für den Zweck eines vertraulichkeitsbewahrenden, verteilten Data-Clustering wird ein neues Kernel-Dichteabschätzungsbasiertes Verfahren namens KDECS vorgestellt. KDECS verwendet eine Approximation der originalen, lokalen Kernel-Dichteschätzung, so dass die ursprünglichen Daten anderer Agenten in der Data Mining-Gruppe mit einer höheren Wahrscheinlichkeit als einem hierfür vorgegebenen Wert nicht mehr zu rekonstruieren sind. Das Verfahren ist nachweislich sicherer als Data-Clustering mit generativen Mixture Modellen und SMC-basiert sicherem k-means Data-Clustering. Zusätzlich stellen wir neue Verfahren, namens DPD-TS, DPD-HE und DPDFS, für eine vertraulichkeitsbewahrende, verteilte Mustererkennung in Zeitreihen vor, deren Komplexität und Sicherheitsgrad wir mit den zuvor erwähnten neuen Privacy-Maßen analysieren. Dabei hängt ein von einzelnen Agenten einer Data Mining-Gruppe jeweils vorgegebener, minimaler Sicherheitsgrad von DPD-TS und DPD-FS nur von der Dimensionsreduktion der Zeitreihenwerte und ihrer Diskretisierung ab und kann leicht überprüft werden. Einen noch besseren Schutz von sensiblen Daten bietet das Verfahren DPD HE mit Hilfe von homomorpher Verschlüsselung. Neben der theoretischen Analyse wurden die experimentellen Leistungsbewertungen der entwickelten Verfahren mit verschiedenen, öffentlich verfügbaren Datensätzen durchgeführt

    EMADS: An Extendible Multi-Agent Data Miner

    No full text
    In this paper we describe EMADS, an Extendible Multi-Agent Data mining System. The EMADS vision is that of a community of data mining agents, contributed by many individuals, interacting under decentralised control to address data mining requests. EMADS is seen both as an end user application and a research tool. This paper details the EMADS vision, the associated conceptual framework and the current implementation. Although EMADS may be applied to many data mining tasks; the study described here, for the sake of brevity, concentrates on agent based data classification. A full description of EMADS is presented
    corecore