4,010 research outputs found

    The UARS and open data concept and analysis study

    Get PDF
    Alternative concepts for a common design for the UARS and OPEN Central Data Handling Facility (CDHF) are offered. Costs for alternative implementations of the UARS designs are presented, showing that the system design does not restrict the implementation to a single manufacturer. Processing demands on the alternative UARS CDHF implementations are then discussed. With this information at hand together with estimates for OPEN processing demands, it is shown that any shortfall in system capability for OPEN support can be remedied by either component upgrades or array processing attachments rather than a system redesign. In addition to a common system design, it is shown that there is significant potential for common software design, especially in the areas of data management software and non-user-unique production software. Archiving the CDHF data are discussed. Following that, cost examples for several modes of communications between the CDHF and Remote User Facilities are presented. Technology application is discussed

    Evaluating tradeoff between recall and perfomance of GPU permutation index

    Get PDF
    Query-by-content, by means of similarity search, is a fundamental operation for applications that deal with multimedia data. For this kind of query it is meaningless to look for elements exactly equal to a given one as query. Instead, we need to measure the dissimilarity between the query object and each database object. This search problem can be formalized with the concept of metric space. In this scenario, the search efficiency is understood as minimizing the number of distance calculations required to answer them. Building an index can be a solution, but with very large metric databases is not enough, it is also necessary to speed up the queries by using high performance computing, as GPU, and in some cases is reasonable to accept a fast answer although it was inexact. In this work we evaluate the tradeoff between the answer quality and time performance of our implementation of Permutation Index, on a pure GPU architecture, used to solve in parallel multiple approximate similarity searches on metric databases.WPDP- XIII Workshop procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    Efficient Cross-Device Query Processing

    Get PDF
    The increasing diversity of hardware within a single system promises large performance gains but also poses a challenge for data management systems. Strategies for the efficient use of hardware with large performance differences are still lacking. For example, existing research on GPU supported data management largely handles the GPU in isolation from the system’s CPU — The GPU is considered the central processor and the CPU used only to mitigate the GPU’s weaknesses where necessary. To make efficient use of all available devices, we developed a processing strategy that lets unequal devices like GPU and CPU combine their strengths rather than work in isolation. To this end, we decompose relational data into individual bits and place the resulting partitions on the appropriate devices. Operations are processed in phases, each phase executed on one device. This way, we achieve significant performance gains and good load distribution among the available devices in a limited real-life use case. To grow this idea into a generic system, we identify challenges as well as potential hardware configurations and applications that can benefit from this approach

    Set-oriented data mining in relational databases

    Get PDF
    Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases

    X-Device Query Processing by Bitwise Distribution

    Get PDF
    The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For exam- ple, existing approaches to CPU/GPU co-processing distribute individual relational operators to the “most appropriate” device. While pleasantly simple, this strategy has a number of problems: it may leave the “inappropriate” devices idle while overloading the “appropriate” device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by par- tially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy
    corecore