1,427 research outputs found

    A customizable multi-agent system for distributed data mining

    Get PDF
    We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances

    Dynamic load balancing in parallel KD-tree k-means

    Get PDF
    One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

    Effectiveness of landmark analysis for establishing locality in p2p networks

    Get PDF
    Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required

    Active networks: an evolution of the internet

    Get PDF
    Active Networks can be seen as an evolution of the classical model of packet-switched networks. The traditional and ”passive” network model is based on a static definition of the network node behaviour. Active Networks propose an “active” model where the intermediate nodes (switches and routers) can load and execute user code contained in the data units (packets). Active Networks are a programmable network model, where bandwidth and computation are both considered shared network resources. This approach opens up new interesting research fields. This paper gives a short introduction of Active Networks, discusses the advantages they introduce and presents the research advances in this field

    Efficient mining of discriminative molecular fragments

    Get PDF
    Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset

    Performance and prediction: Bayesian modelling of fallible choice in chess

    Get PDF
    Evaluating agents in decision-making applications requires assessing their skill and predicting their behaviour. Both are well developed in Poker-like situations, but less so in more complex game and model domains. This paper addresses both tasks by using Bayesian inference in a benchmark space of reference agents. The concepts are explained and demonstrated using the game of chess but the model applies generically to any domain with quantifiable options and fallible choice. Demonstration applications address questions frequently asked by the chess community regarding the stability of the rating scale, the comparison of players of different eras and/or leagues, and controversial incidents possibly involving fraud. The last include alleged under-performance, fabrication of tournament results, and clandestine use of computer advice during competition. Beyond the model world of games, the aim is to improve fallible human performance in complex, high-value tasks

    A fuzzy approach for the network congestion problem

    Get PDF
    In the recent years, the unpredictable growth of the Internet has moreover pointed out the congestion problem, one of the problems that historicallyha ve affected the network. This paper deals with the design and the evaluation of a congestion control algorithm which adopts a FuzzyCon troller. The analogyb etween Proportional Integral (PI) regulators and Fuzzycon trollers is discussed and a method to determine the scaling factors of the Fuzzycon troller is presented. It is shown that the Fuzzycon troller outperforms the PI under traffic conditions which are different from those related to the operating point considered in the design

    Il ponte delle Teste sul fiume Oreto

    Get PDF
    Il fortuito ritrovamento al di sotto del terreno di riporto, di un antico ponte in muratura ancora perfettamente integro, ha sollecitato una ricerca sull\u2019epoca e le tecniche di costruzione adottate. Attraverso l\u2019analisi delle fonti antiche, della cartografia attuale e storica, delle cronache del tempo si \ue8 delineata la storia del sistema di attraversamento del fiume Oreto, l\u2019espansione della citt\ue0 fino all\u2019area del ponte, ritrovando quelle pratiche sociali che hanno fatto attribuire il nome non consueto. Lo studio ha inoltre ripercorso le vicende costruttive delle diverse strutture che si sono susseguite nel sito, dalla fine del XVI secolo a pochi anni addietro. Si sono ritrovate informazioni ed immagini inedite, utili per interpretare le questioni puramente tecniche, insieme a quelle connesse alla viabilit\ue0 ed allo sviluppo della citt\ue0Abstract. The fortuitous discovery below the backfill, of an old still perfectly intact stone bridge, called us for a search on the age and construction techniques employed. Through the analysis of ancient sources, the historical current mapping, chronicles of the time, it has outlined the history of the river Oreto crossing system, the expansion of the city to the area of the bridge, finding those social practices that have made that peculiar name. The study also traced the building phases of the different structures that have taken place on the site, by the end of the sixteenth century, up to a few years ago. We have found new information and unpublished images, useful for interpreting the purely technical issues, along with those related to the viability and development of the city

    Distributed mining of molecular fragments

    Get PDF
    In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment
    corecore