3,486 research outputs found

    Representing numeric data in 32 bits while preserving 64-bit precision

    Full text link
    Data files often consist of numbers having only a few significant decimal digits, whose information content would allow storage in only 32 bits. However, we may require that arithmetic operations involving these numbers be done with 64-bit floating-point precision, which precludes simply representing the data as 32-bit floating-point values. Decimal floating point gives a compact and exact representation, but requires conversion with a slow division operation before it can be used. Here, I show that interesting subsets of 64-bit floating-point values can be compactly and exactly represented by the 32 bits consisting of the sign, exponent, and high-order part of the mantissa, with the lower-order 32 bits of the mantissa filled in by table lookup, indexed by bits from the part of the mantissa retained, and possibly from the exponent. For example, decimal data with 4 or fewer digits to the left of the decimal point and 2 or fewer digits to the right of the decimal point can be represented in this way using the lower-order 5 bits of the retained part of the mantissa as the index. Data consisting of 6 decimal digits with the decimal point in any of the 7 positions before or after one of the digits can also be represented this way, and decoded using 19 bits from the mantissa and exponent as the index. Encoding with such a scheme is a simple copy of half the 64-bit value, followed if necessary by verification that the value can be represented, by checking that it decodes correctly. Decoding requires only extraction of index bits and a table lookup. Lookup in a small table will usually reference cache; even with larger tables, decoding is still faster than conversion from decimal floating point with a division operation. I discuss how such schemes perform on recent computer systems, and how they might be used to automatically compress large arrays in interpretive languages such as R

    Second-generation PLINK: rising to the challenge of larger and richer datasets

    Get PDF
    PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.Comment: 2 figures, 1 additional fil

    Data Mining the SDSS SkyServer Database

    Full text link
    An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do

    Design and analysis of an accelerated seed generation stage for BLASTP on the Mercury system - Master\u27s Thesis, August 2006

    Get PDF
    NCBI BLASTP is a popular sequence analysis tool used to study the evolutionary relationship between two protein sequences. Protein databases continue to grow exponentially as entire genomes of organisms are sequenced, making sequence analysis a computationally demanding task. For example, a search of the E. coli. k12 proteome against the GenBank Non-Redundant database takes 36 hours on a standard workstation. In this thesis, we look to address the problem by accelerating protein searching using Field Programmable Gate Arrays. We focus our attention on the BLASTP heuristic, building on work done earlier to accelerate DNA searching on the Mercury platform. We analyze the performance characteristics of the BLASTP algorithm and explore the design space of the seed generation stage in detail. We propose a hardware/software architecture and evaluate the performance of the individual stage, and its effect on the overall BLASTP pipeline running on the Mercury system. The seed generation stage is 13x faster than the software equivalent, and the integrated BLASTP pipeline is predicted to yield a speedup of 50x over NCBI BLASTP. Mercury BLASTP also shows a 2.5x speed improvement over the only other BLASTP-like accelerator for FPGAs while consuming far fewer logic resources

    Multi-touch Detection and Semantic Response on Non-parametric Rear-projection Surfaces

    Get PDF
    The ability of human beings to physically touch our surroundings has had a profound impact on our daily lives. Young children learn to explore their world by touch; likewise, many simulation and training applications benefit from natural touch interactivity. As a result, modern interfaces supporting touch input are ubiquitous. Typically, such interfaces are implemented on integrated touch-display surfaces with simple geometry that can be mathematically parameterized, such as planar surfaces and spheres; for more complicated non-parametric surfaces, such parameterizations are not available. In this dissertation, we introduce a method for generalizable optical multi-touch detection and semantic response on uninstrumented non-parametric rear-projection surfaces using an infrared-light-based multi-camera multi-projector platform. In this paradigm, touch input allows users to manipulate complex virtual 3D content that is registered to and displayed on a physical 3D object. Detected touches trigger responses with specific semantic meaning in the context of the virtual content, such as animations or audio responses. The broad problem of touch detection and response can be decomposed into three major components: determining if a touch has occurred, determining where a detected touch has occurred, and determining how to respond to a detected touch. Our fundamental contribution is the design and implementation of a relational lookup table architecture that addresses these challenges through the encoding of coordinate relationships among the cameras, the projectors, the physical surface, and the virtual content. Detecting the presence of touch input primarily involves distinguishing between touches (actual contact events) and hovers (near-contact proximity events). We present and evaluate two algorithms for touch detection and localization utilizing the lookup table architecture. One of the algorithms, a bounded plane sweep, is additionally able to estimate hover-surface distances, which we explore for interactions above surfaces. The proposed method is designed to operate with low latency and to be generalizable. We demonstrate touch-based interactions on several physical parametric and non-parametric surfaces, and we evaluate both system accuracy and the accuracy of typical users in touching desired targets on these surfaces. In a formative human-subject study, we examine how touch interactions are used in the context of healthcare and present an exploratory application of this method in patient simulation. A second study highlights the advantages of touch input on content-matched physical surfaces achieved by the proposed approach, such as decreases in induced cognitive load, increases in system usability, and increases in user touch performance. In this experiment, novice users were nearly as accurate when touching targets on a 3D head-shaped surface as when touching targets on a flat surface, and their self-perception of their accuracy was higher

    Frequent itemset mining on multiprocessor systems

    Get PDF
    Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

    The Interface Region Imaging Spectrograph (IRIS)

    Get PDF
    The Interface Region Imaging Spectrograph (IRIS) small explorer spacecraft provides simultaneous spectra and images of the photosphere, chromosphere, transition region, and corona with 0.33-0.4 arcsec spatial resolution, 2 s temporal resolution and 1 km/s velocity resolution over a field-of-view of up to 175 arcsec x 175 arcsec. IRIS was launched into a Sun-synchronous orbit on 27 June 2013 using a Pegasus-XL rocket and consists of a 19-cm UV telescope that feeds a slit-based dual-bandpass imaging spectrograph. IRIS obtains spectra in passbands from 1332-1358, 1389-1407 and 2783-2834 Angstrom including bright spectral lines formed in the chromosphere (Mg II h 2803 Angstrom and Mg II k 2796 Angstrom) and transition region (C II 1334/1335 Angstrom and Si IV 1394/1403 Angstrom). Slit-jaw images in four different passbands (C II 1330, Si IV 1400, Mg II k 2796 and Mg II wing 2830 Angstrom) can be taken simultaneously with spectral rasters that sample regions up to 130 arcsec x 175 arcsec at a variety of spatial samplings (from 0.33 arcsec and up). IRIS is sensitive to emission from plasma at temperatures between 5000 K and 10 MK and will advance our understanding of the flow of mass and energy through an interface region, formed by the chromosphere and transition region, between the photosphere and corona. This highly structured and dynamic region not only acts as the conduit of all mass and energy feeding into the corona and solar wind, it also requires an order of magnitude more energy to heat than the corona and solar wind combined. The IRIS investigation includes a strong numerical modeling component based on advanced radiative-MHD codes to facilitate interpretation of observations of this complex region. Approximately eight Gbytes of data (after compression) are acquired by IRIS each day and made available for unrestricted use within a few days of the observation.Comment: 53 pages, 15 figure
    • …
    corecore