171,838 research outputs found

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

    Efficient mining of discriminative molecular fragments

    Get PDF
    Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset

    Implementing Graph Pattern Mining for Big Data in the Cloud

    Get PDF
    With the increasing popularity of various social networking sites, there is an explosive growth in data associated with these, so mining big data has become an important problem in the graph pattern mining research area. Graph mining helps to explore the patterns from networks or databases. Till now various graph mining techniques exist for mining frequent patterns for a graph database which contains relatively small sized graphs. But with the rapid arrival of the era of big data, traditional graph mining approaches have been unable to meet large data analysis needs. In this context, this paper proposes an adaptation to the big graph data mining approach especially in the field of social networks. The proposed approach is based on Hadoop plateform, and improves the efficiency by processing big data in distributed fashion. Again the proposed approach can be adapted to cloud environment which has the merits – load balancing, scalability and efficiency. Experiments have been conducted with real Facebook data set. The approach can be also adapted to dataset larger than experimented data. DOI: 10.17762/ijritcc2321-8169.150514

    Algorithms for Extracting Frequent Episodes in the Process of Temporal Data Mining

    Get PDF
    An important aspect in the data mining process is the discovery of patterns having a great influence on the studied problem. The purpose of this paper is to study the frequent episodes data mining through the use of parallel pattern discovery algorithms. Parallel pattern discovery algorithms offer better performance and scalability, so they are of a great interest for the data mining research community. In the following, there will be highlighted some parallel and distributed frequent pattern mining algorithms on various platforms and it will also be presented a comparative study of their main features. The study takes into account the new possibilities that arise along with the emerging novel Compute Unified Device Architecture from the latest generation of graphics processing units. Based on their high performance, low cost and the increasing number of features offered, GPU processors are viable solutions for an optimal implementation of frequent pattern mining algorithmsFrequent Pattern Mining, Parallel Computing, Dynamic Load Balancing, Temporal Data Mining, CUDA, GPU, Fermi, Thread

    Balancing Leisure and Work: Evidence from the Seasonal Home

    Get PDF
    Seasonal homes are used during leisure time for many recreational activities, yet recent technological innovations have diminished the separation between the work place and the seasonal home. In a survey of Walworth County seasonal home owners, most who work full time report they seldom work during vacations and weekends from their seasonal home. Yet there is a distinct subgroup who do mix work into weekends and vacations for a variety of reasons. The most frequent reasons given by these people for working from the seasonal home were related to the expectations of coworkers and clients. Understanding more about the habits and motivations of those who frequently work during weekends and on vacations could provide a new perspective on the obstacles everyone faces in balancing work and leisure

    An efficient parallel method for mining frequent closed sequential patterns

    Get PDF
    Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739

    Best practices for HPM-assisted performance engineering on modern multicore processors

    Full text link
    Many tools and libraries employ hardware performance monitoring (HPM) on modern processors, and using this data for performance assessment and as a starting point for code optimizations is very popular. However, such data is only useful if it is interpreted with care, and if the right metrics are chosen for the right purpose. We demonstrate the sensible use of hardware performance counters in the context of a structured performance engineering approach for applications in computational science. Typical performance patterns and their respective metric signatures are defined, and some of them are illustrated using case studies. Although these generic concepts do not depend on specific tools or environments, we restrict ourselves to modern x86-based multicore processors and use the likwid-perfctr tool under the Linux OS.Comment: 10 pages, 2 figure

    Contrasting patterns of selection between MHC I and II across populations of Humboldt and Magellanic penguins

    Get PDF
    Indexación: Web of ScienceThe evolutionary and adaptive potential of populations or species facing an emerging infectious disease depends on their genetic diversity in genes, such as the major histocompatibility complex (MHC). In birds, MHC class I deals predominantly with intracellular infections (e.g., viruses) and MHC class II with extracellular infections (e.g., bacteria). Therefore, patterns of MHC I and II diversity may differ between species and across populations of species depending on the relative effect of local and global environmental selective pressures, genetic drift, and gene flow. We hypothesize that high gene flow among populations of Humboldt and Magellanic penguins limits local adaptation in MHC I and MHC II, and signatures of selection differ between markers, locations, and species. We evaluated the MHC I and II diversity using 454 next-generation sequencing of 100 Humboldt and 75 Magellanic penguins from seven different breeding colonies. Higher genetic diversity was observed in MHC I than MHC II for both species, explained by more than one MHC I loci identified. Large population sizes, high gene flow, and/or similar selection pressures maintain diversity but limit local adaptation in MHC I. A pattern of isolation by distance was observed for MHC II for Humboldt penguin suggesting local adaptation, mainly on the northernmost studied locality. Furthermore, trans species alleles were found due to a recent speciation for the genus or convergent evolution. High MHC I and MHC II gene diversity described is extremely advantageous for the long term survival of the species.http://onlinelibrary.wiley.com/doi/10.1002/ece3.2502/epd
    corecore