1,218 research outputs found

    On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences

    Get PDF
    As a consequence of ever more powerful computing hardware and increasingly precise instruments, our capacity to produce scientific data by far outpaces our ability to efficiently store and analyse it. Few of today's tools to analyse scientific data are able to handle the deluge captured by instruments or generated by supercomputers. In many scenarios, however, it suffices to analyse a small subset of the data in detail. What scientists analysing the data consequently need are efficient means to explore the full dataset using approximate query results and to identify the subsets of interest. Once found, interesting areas can still be scrutinised using a precise, but also more time-consuming analysis. Data synopses fit the bill as they provide fast (but approximate) query execution on massive amounts of data. Generating data synopses after the data is stored, however, requires us to analyse all the data again, and is thus inefficient What we propose is to generate the synopsis for simulation applications on-the-fly when the data is captured. Doing so typically means changing the simulation or data capturing code and is tedious and typically just a one-off solution that is not generally applicable. In contrast, our vision gives scientists a high-level language and the infrastructure needed to generate code that creates data synopses on-the-fly, as the simulation runs. In this paper we discuss the data management challenges associated with our approach</jats:p

    QUASII: QUery-Aware Spatial Incremental Index.

    Get PDF
    With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes

    Probabilistic Cross-Identification of Cosmic Events

    Full text link
    We discuss a novel approach to identifying cosmic events in separate and independent observations. In our focus are the true events, such as supernova explosions, that happen once, hence, whose measurements are not repeatable. Their classification and analysis have to make the best use of all the available data. Bayesian hypothesis testing is used to associate streams of events in space and time. Probabilities are assigned to the matches by studying their rates of occurrence. A case study of Type Ia supernovae illustrates how to use lightcurves in the cross-identification process. Constraints from realistic lightcurves happen to be well-approximated by Gaussians in time, which makes the matching process very efficient. Model-dependent associations are computationally more demanding but can further boost our confidence.Comment: 5 pages, 2 figures, accepted to Ap

    Phage selection of cyclic peptide antagonists with increased stability toward intestinal proteases

    Get PDF
    The oral delivery of protein and peptide drugs is limited by their proteolytic degradation and the poor absorption across the intestinal epithelia. In this work, we exposed a phage library of small bicyclic peptides (<1.5 kDa) to a pancreatic extract of proteases prior to affinity selection to enrich binders with higher stability in the intestinal environment. Panning with the therapeutic target plasma kallikrein yielded potent inhibitors (Kis between 5.6 and 336 nM) wherein bicyclic peptides isolated with proteolytic pressure were more stable. A proline residue found in a specific position of several resistant bicyclic peptides proved to be a ‘protective mark', rendering the bicyclic peptides resistant to significantly higher concentrations of intestinal proteases while retaining essentially their inhibitory activit

    Composition minérale des fourrages consommés par les ruminants domestiques

    Full text link

    Cross-Identification Performance from Simulated Detections: GALEX and SDSS

    Full text link
    We investigate the quality of associations of astronomical sources from multi-wavelength observations using simulated detections that are realistic in terms of their astrometric accuracy, small-scale clustering properties and selection functions. We present a general method to build such mock catalogs for studying associations, and compare the statistics of cross-identifications based on angular separation and Bayesian probability criteria. In particular, we focus on the highly relevant problem of cross-correlating the ultraviolet Galaxy Evolution Explorer (GALEX) and optical Sloan Digital Sky Survey (SDSS) surveys. Using refined simulations of the relevant catalogs, we find that the probability thresholds yield lower contamination of false associations, and are more efficient than angular separation. Our study presents a set of recommended criteria to construct reliable cross-match catalogs between SDSS and GALEX with minimal artifacts.Comment: 7 pages, 9 figures; ApJ in pres

    The spectral energy distribution of galaxies at z > 2.5: Implications from the Herschel/SPIRE color-color diagram

    Full text link
    We use the Herschel SPIRE color-color diagram to study the spectral energy distribution (SED) and the redshift estimation of high-z galaxies. We compiled a sample of 57 galaxies with spectroscopically confirmed redshifts and SPIRE detections in all three bands at z=2.56.4z=2.5-6.4, and compared their average SPIRE colors with SED templates from local and high-z libraries. We find that local SEDs are inconsistent with high-z observations. The local calibrations of the parameters need to be adjusted to describe the average colors of high-z galaxies. For high-z libraries, the templates with an evolution from z=0 to 3 can well describe the average colors of the observations at high redshift. Using these templates, we defined color cuts to divide the SPIRE color-color diagram into different regions with different mean redshifts. We tested this method and two other color cut methods using a large sample of 783 Herschel-selected galaxies, and find that although these methods can separate the sample into populations with different mean redshifts, the dispersion of redshifts in each population is considerably large. Additional information is needed for better sampling.Comment: 17 pages, 14 figures, accepted for publication in A&
    corecore