71,934 research outputs found

    Distributed Management of Massive Data: an Efficient Fine-Grain Data Access Scheme

    Get PDF
    This paper addresses the problem of efficiently storing and accessing massive data blocks in a large-scale distributed environment, while providing efficient fine-grain access to data subsets. This issue is crucial in the context of applications in the field of databases, data mining and multimedia. We propose a data sharing service based on distributed, RAM-based storage of data, while leveraging a DHT-based, natively parallel metadata management scheme. As opposed to the most commonly used grid storage infrastructures that provide mechanisms for explicit data localization and transfer, we provide a transparent access model, where data are accessed through global identifiers. Our proposal has been validated through a prototype implementation whose preliminary evaluation provides promising results

    A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads MPI Processes and the Grid

    Full text link
    The work described in this paper tackles the problem of data mining and classification of large amounts of data using the K nearest neighbours classifier (KNN) [1]. The large computing demand of this process is solved with a parallel computing implementation specially designed to work in Grid environments of multiprocessor computer farms. The different parallel computing approaches (intra-node, inter-node and inter-organisations) are not sufficient by themselves to face the computing demand of such a big problem. Instead of using parallel techniques separately, we propose to combine the three of them considering the parallelism grain of the different parts of the problem. The main purpose is to complete a 1 month-CPU job in a few hours. The technologies that are being used are the EGEE Grid Computing Infrastructure running the Large Hadron Collider Computing Grid (LCG 2.6) middleware [3], MPI [4] [5] and POSIX [6] threads. Finally, we compare the results obtained with the most popular and used tools to understand the importance of this strategy.Aparicio Pla, G.; Blanquer Espert, I.; Hernández García, V. (2007). A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads MPI Processes and the Grid. En High Performance Computing for Computational Science - VECPAR 2006. Springer Verlag (Germany). 225-235. doi:10.1007/978-3-540-71351-7_18S225235Cover, T.M., Hart, P.E.: Nearest neighbour pattern recognition. IEEE Trans. on Information Theory 13(1), 2127 (1967)Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications 15(3) (2001), http://www.globus.org/research/papers/anatomy.pdfLCG: World Wide Web Computing Grid. Distributed Production Environment of Physics Data Processing. http://lcg.web.cern.ch/LCGMessage Passing Interface Forum: MPI: A message-passing interface standard (2003), http://www.mpi-forum.org/Gropp, W., et al.: MPI: The Complete Reference. MIT Press, Cambridge (1998)Drepper, U., Molnar, I.: The Native POSIX Thread Library for Linux (2003), http://people.redhat.com/drepper/nptl-design.pdfFrank, E., Hall, M., L.T.: Weka 3: Data Mining Software in Java (2005), http://www.cs.waikato.ac.nz/ml/wek

    An Efficient Load Balancing Multi-core Frequent Patterns Mining Algorithm

    Get PDF
    Abstract-Mining frequent pattern from transactional database is an important problem in data mining. Many methods have been proposed to solve this problem. However, the computation time still increase significantly while the data size grows. Therefore, parallel computing is a good strategy to solve this problem. Researchers have proposed various parallel and distributed algorithms on cluster system, grid system. However, the construction and maintenance cost is pretty high. In this paper, a multi-core load balancing frequent pattern mining algorithm is presented. The main goal of the proposed algorithm is to reduce the massive duplicated candidates generated in previous method. In order to verify the performance, we also implemented the proposed algorithm as well as previous methods for comparison. The experimental results showed that our method could reduce the computation time dramatically with more threads. Moreover, we could observe that the workload was equally dispatched to each computing unit
    • …
    corecore