68,367 research outputs found

    Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

    Full text link
    Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

    Distributed Management of Massive Data: an Efficient Fine-Grain Data Access Scheme

    Get PDF
    This paper addresses the problem of efficiently storing and accessing massive data blocks in a large-scale distributed environment, while providing efficient fine-grain access to data subsets. This issue is crucial in the context of applications in the field of databases, data mining and multimedia. We propose a data sharing service based on distributed, RAM-based storage of data, while leveraging a DHT-based, natively parallel metadata management scheme. As opposed to the most commonly used grid storage infrastructures that provide mechanisms for explicit data localization and transfer, we provide a transparent access model, where data are accessed through global identifiers. Our proposal has been validated through a prototype implementation whose preliminary evaluation provides promising results

    Proceedings of the ECSCW'95 Workshop on the Role of Version Control in CSCW Applications

    Full text link
    The workshop entitled "The Role of Version Control in Computer Supported Cooperative Work Applications" was held on September 10, 1995 in Stockholm, Sweden in conjunction with the ECSCW'95 conference. Version control, the ability to manage relationships between successive instances of artifacts, organize those instances into meaningful structures, and support navigation and other operations on those structures, is an important problem in CSCW applications. It has long been recognized as a critical issue for inherently cooperative tasks such as software engineering, technical documentation, and authoring. The primary challenge for versioning in these areas is to support opportunistic, open-ended design processes requiring the preservation of historical perspectives in the design process, the reuse of previous designs, and the exploitation of alternative designs. The primary goal of this workshop was to bring together a diverse group of individuals interested in examining the role of versioning in Computer Supported Cooperative Work. Participation was encouraged from members of the research community currently investigating the versioning process in CSCW as well as application designers and developers who are familiar with the real-world requirements for versioning in CSCW. Both groups were represented at the workshop resulting in an exchange of ideas and information that helped to familiarize developers with the most recent research results in the area, and to provide researchers with an updated view of the needs and challenges faced by application developers. In preparing for this workshop, the organizers were able to build upon the results of their previous one entitled "The Workshop on Versioning in Hypertext" held in conjunction with the ECHT'94 conference. The following section of this report contains a summary in which the workshop organizers report the major results of the workshop. The summary is followed by a section that contains the position papers that were accepted to the workshop. The position papers provide more detailed information describing recent research efforts of the workshop participants as well as current challenges that are being encountered in the development of CSCW applications. A list of workshop participants is provided at the end of the report. The organizers would like to thank all of the participants for their contributions which were, of course, vital to the success of the workshop. We would also like to thank the ECSCW'95 conference organizers for providing a forum in which this workshop was possible

    Set-oriented data mining in relational databases

    Get PDF
    Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases
    • …
    corecore