208 research outputs found

    On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences

    Get PDF

    QUASII: QUery-Aware Spatial Incremental Index.

    Get PDF
    With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes

    Towards batch-processing on cold storage devices

    Get PDF
    Large amounts of data in storage systems is cold, i.e., Written Once and Read Occasionally (WORO). The rapid growth of massive-scale archival and historical data increases the demand for petabyte-scale cheap storage for such cold data. A Cold Storage Device (CSD) is a disk-based storage system which is designed to trade off performance for cost and power efficiency. Inevitably, the design restrictions used in CSD's results in performance limitations. These limitations are not a concern for WORO workloads, however, the very low price/performance characteristics of CSDs makes them interesting for other applications, e.g., batch processes, too. Applications, however, can be very slow on CSD's if they do not take their characteristics into account. In this paper we design two strategies for data partitioning in CSDs -- a crucial operation in many batch analytics tasks like hash-join, near-duplicate detection, and data localization. We show that our strategies can efficiently use CSDs for batch processing of terabyte-scale data by accelerating data partitioning by 3.5x in our experiments

    AKARI/IRC Broadband Mid-infrared data as an indicator of Star Formation Rate

    Full text link
    AKARI/Infrared Camera (IRC) Point Source Catalog provides a large amount of flux data at {\it S9W} (9 μm9\ {\rm \mu m}) and {\it L18W} (18 μm18\ {\rm \mu m}) bands. With the goal of constructing Star-Formation Rate(SFR) calculations using IRC data, we analyzed an IR selected GALEX-SDSS-2MASS-AKARI(IRC/Far-Infrared Surveyor) sample of 153 nearby galaxies. The far-infrared fluxes were obtained from AKARI diffuse maps to correct the underestimation for extended sources raised by the point-spread function photometry. SFRs of these galaxies were derived by the spectral energy distribution fitting program CIGALE. In spite of complicated features contained in these bands, both the {\it S9W} and {\it L18W} emission correlate with the SFR of galaxies. The SFR calibrations using {\it S9W} and {\it L18W} are presented for the first time. These calibrations agree well with previous works based on Spitzer data within the scatters, and should be applicable to dust-rich galaxies.Comment: PASJ, in pres

    Mass spectrometry of the white adipose metabolome in a hibernating mammal reveals seasonal changes in alternate fuels and carnitine derivatives

    Get PDF
    Mammalian hibernators undergo substantial changes in metabolic function throughout the seasonal hibernation cycle. We report here the polar metabolomic profile of white adipose tissue isolated from active and hibernating thirteen-lined ground squirrels (Ictidomys tridecemlineatus). Polar compounds in white adipose tissue were extracted from five groups representing different timepoints throughout the seasonal activity-torpor cycle and analyzed using hydrophilic interaction liquid chromatography-mass spectrometry in both the positive and negative ion modes. A total of 224 compounds out of 660 features detected after curation were annotated. Unsupervised clustering using principal component analysis revealed discrete clusters representing the different seasonal timepoints throughout hibernation. One-way analysis of variance and feature intensity heatmaps revealed metabolites that varied in abundance between active and torpid timepoints. Pathway analysis compared against the KEGG database demonstrated enrichment of amino acid metabolism, purine metabolism, glycerophospholipid metabolism, and coenzyme A biosynthetic pathways among our identified compounds. Numerous carnitine derivatives and a ketone that serves as an alternate fuel source, betahydroxybutyrate (BHB), were among molecules found to be elevated during torpor. Elevated levels of the BHB-carnitine conjugate during torpor suggests the synthesis of beta-hydroxybutyrate in white adipose mitochondria, which may contribute directly to elevated levels of circulating BHB during hibernation

    Mass spectrometry of the white adipose metabolome in a hibernating mammal reveals seasonal changes in alternate fuels and carnitine derivatives

    Get PDF
    Mammalian hibernators undergo substantial changes in metabolic function throughout the seasonal hibernation cycle. We report here the polar metabolomic profile of white adipose tissue isolated from active and hibernating thirteen-lined ground squirrels (Ictidomys tridecemlineatus). Polar compounds in white adipose tissue were extracted from five groups representing different timepoints throughout the seasonal activity-torpor cycle and analyzed using hydrophilic interaction liquid chromatography-mass spectrometry in both the positive and negative ion modes. A total of 224 compounds out of 660 features detected after curation were annotated. Unsupervised clustering using principal component analysis revealed discrete clusters representing the different seasonal timepoints throughout hibernation. One-way analysis of variance and feature intensity heatmaps revealed metabolites that varied in abundance between active and torpid timepoints. Pathway analysis compared against the KEGG database demonstrated enrichment of amino acid metabolism, purine metabolism, glycerophospholipid metabolism, and coenzyme A biosynthetic pathways among our identified compounds. Numerous carnitine derivatives and a ketone that serves as an alternate fuel source, betahydroxybutyrate (BHB), were among molecules found to be elevated during torpor. Elevated levels of the BHB-carnitine conjugate during torpor suggests the synthesis of beta-hydroxybutyrate in white adipose mitochondria, which may contribute directly to elevated levels of circulating BHB during hibernation

    The spectral energy distribution of galaxies at z > 2.5: Implications from the Herschel/SPIRE color-color diagram

    Full text link
    We use the Herschel SPIRE color-color diagram to study the spectral energy distribution (SED) and the redshift estimation of high-z galaxies. We compiled a sample of 57 galaxies with spectroscopically confirmed redshifts and SPIRE detections in all three bands at z=2.56.4z=2.5-6.4, and compared their average SPIRE colors with SED templates from local and high-z libraries. We find that local SEDs are inconsistent with high-z observations. The local calibrations of the parameters need to be adjusted to describe the average colors of high-z galaxies. For high-z libraries, the templates with an evolution from z=0 to 3 can well describe the average colors of the observations at high redshift. Using these templates, we defined color cuts to divide the SPIRE color-color diagram into different regions with different mean redshifts. We tested this method and two other color cut methods using a large sample of 783 Herschel-selected galaxies, and find that although these methods can separate the sample into populations with different mean redshifts, the dispersion of redshifts in each population is considerably large. Additional information is needed for better sampling.Comment: 17 pages, 14 figures, accepted for publication in A&

    TRANSFORMERS: Robust spatial joins on non-uniform data distributions

    Get PDF
    Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, each of them has a strength for a particular type of density ratio among the joined datasets. More generally, no single proposed method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have locally divergent data distributions. In this paper we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. It employs a join method based on data-oriented partitioning when joining areas of substantially different local densities, whereas it uses big partitions (as in space-oriented partitioning) when the densities are similar, while seamlessly switching among these two strategies at runtime. We experimentally demonstrate that TRANSFORMERS outperforms state-of-the-art approaches by a factor of between 2 and 8

    Space odyssey: efficient exploration of scientific data.

    Get PDF
    Advances in data acquisition---through more powerful supercomputers for simulation or sensors with better resolution---help scientists tremendously to understand natural phenomena. At the same time, however, it leaves them with a plethora of data and the challenge of analysing it. Ingesting all the data in a database or indexing it for an efficient analysis is unlikely to pay off because scientists rarely need to analyse all data. Not knowing a priori what parts of the datasets need to be analysed makes the problem challenging. Tools and methods to analyse only subsets of this data are rather rare. In this paper we therefore present Space Odyssey, a novel approach enabling scientists to efficiently explore multiple spatial datasets of massive size. Without any prior information, Space Odyssey incrementally indexes the datasets and optimizes the access to datasets frequently queried together. As our experiments show, through incrementally indexing and changing the data layout on disk, Space Odyssey accelerates exploratory analysis of spatial data by substantially reducing query-to-insight time compared to the state of the art

    Data Infrastructure for Medical Research

    Get PDF
    While we are witnessing rapid growth in data across the sciences and in many applications, this growth is particularly remarkable in the medical domain, be it because of higher resolution instruments and diagnostic tools (e.g. MRI), new sources of structured data like activity trackers, the wide-spread use of electronic health records and many others. The sheer volume of the data is not, however, the only challenge to be faced when using medical data for research. Other crucial challenges include data heterogeneity, data quality, data privacy and so on. In this article, we review solutions addressing these challenges by discussing the current state of the art in the areas of data integration, data cleaning, data privacy, scalable data access and processing in the context of medical data. The techniques and tools we present will give practitioners — computer scientists and medical researchers alike — a starting point to understand the challenges and solutions and ultimately to analyse medical data and gain better and quicker insights
    corecore