905 research outputs found

    On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences

    Get PDF

    Probabilistic Cross-Identification of Cosmic Events

    Full text link
    We discuss a novel approach to identifying cosmic events in separate and independent observations. In our focus are the true events, such as supernova explosions, that happen once, hence, whose measurements are not repeatable. Their classification and analysis have to make the best use of all the available data. Bayesian hypothesis testing is used to associate streams of events in space and time. Probabilities are assigned to the matches by studying their rates of occurrence. A case study of Type Ia supernovae illustrates how to use lightcurves in the cross-identification process. Constraints from realistic lightcurves happen to be well-approximated by Gaussians in time, which makes the matching process very efficient. Model-dependent associations are computationally more demanding but can further boost our confidence.Comment: 5 pages, 2 figures, accepted to Ap

    QUASII: QUery-Aware Spatial Incremental Index.

    Get PDF
    With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes

    Phage selection of cyclic peptide antagonists with increased stability toward intestinal proteases

    Get PDF
    The oral delivery of protein and peptide drugs is limited by their proteolytic degradation and the poor absorption across the intestinal epithelia. In this work, we exposed a phage library of small bicyclic peptides (<1.5 kDa) to a pancreatic extract of proteases prior to affinity selection to enrich binders with higher stability in the intestinal environment. Panning with the therapeutic target plasma kallikrein yielded potent inhibitors (Kis between 5.6 and 336 nM) wherein bicyclic peptides isolated with proteolytic pressure were more stable. A proline residue found in a specific position of several resistant bicyclic peptides proved to be a ‘protective mark', rendering the bicyclic peptides resistant to significantly higher concentrations of intestinal proteases while retaining essentially their inhibitory activit

    Towards batch-processing on cold storage devices

    Get PDF
    Large amounts of data in storage systems is cold, i.e., Written Once and Read Occasionally (WORO). The rapid growth of massive-scale archival and historical data increases the demand for petabyte-scale cheap storage for such cold data. A Cold Storage Device (CSD) is a disk-based storage system which is designed to trade off performance for cost and power efficiency. Inevitably, the design restrictions used in CSD's results in performance limitations. These limitations are not a concern for WORO workloads, however, the very low price/performance characteristics of CSDs makes them interesting for other applications, e.g., batch processes, too. Applications, however, can be very slow on CSD's if they do not take their characteristics into account. In this paper we design two strategies for data partitioning in CSDs -- a crucial operation in many batch analytics tasks like hash-join, near-duplicate detection, and data localization. We show that our strategies can efficiently use CSDs for batch processing of terabyte-scale data by accelerating data partitioning by 3.5x in our experiments
    corecore