Search CORE

208 research outputs found

On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences

Author: Ham DA
Heinis T
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2015
Field of study

Spiral - Imperial College Digital Repository

QUASII: QUery-Aware Spatial Incremental Index.

Author: Ailamaki A
Heinis T
Pavlovic M
Sidlauskas D
Publication venue: OpenProceedings.org
Publication date: 26/03/2018
Field of study

With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes

Infoscience - École polytechnique fédérale de Lausanne

Spiral - Imperial College Digital Repository

Towards batch-processing on cold storage devices

Author: Hadian A
Heinis T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/04/2018
Field of study

Large amounts of data in storage systems is cold, i.e., Written Once and Read Occasionally (WORO). The rapid growth of massive-scale archival and historical data increases the demand for petabyte-scale cheap storage for such cold data. A Cold Storage Device (CSD) is a disk-based storage system which is designed to trade off performance for cost and power efficiency. Inevitably, the design restrictions used in CSD's results in performance limitations. These limitations are not a concern for WORO workloads, however, the very low price/performance characteristics of CSDs makes them interesting for other applications, e.g., batch processes, too. Applications, however, can be very slow on CSD's if they do not take their characteristics into account. In this paper we design two strategies for data partitioning in CSDs -- a crucial operation in many batch analytics tasks like hash-join, near-duplicate detection, and data localization. We show that our strategies can efficiently use CSDs for batch processing of terabyte-scale data by accelerating data partitioning by 3.5x in our experiments

Crossref

Spiral - Imperial College Digital Repository

AKARI/IRC Broadband Mid-infrared data as an indicator of Star Formation Rate

Author: Buat V.
Burgarella D.
Giovannoli E.
Heinis S.
Iglesias-Paramo J.
Murata K. L.
Takeuchi T. T.
Yuan F. -T.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/09/2011
Field of study

AKARI/Infrared Camera (IRC) Point Source Catalog provides a large amount of flux data at {\it S9W} (

9\ {\rm \mu m}

) and {\it L18W} (

18\ {\rm \mu m}

) bands. With the goal of constructing Star-Formation Rate(SFR) calculations using IRC data, we analyzed an IR selected GALEX-SDSS-2MASS-AKARI(IRC/Far-Infrared Surveyor) sample of 153 nearby galaxies. The far-infrared fluxes were obtained from AKARI diffuse maps to correct the underestimation for extended sources raised by the point-spread function photometry. SFRs of these galaxies were derived by the spectral energy distribution fitting program CIGALE. In spite of complicated features contained in these bands, both the {\it S9W} and {\it L18W} emission correlate with the SFR of galaxies. The SFR calibrations using {\it S9W} and {\it L18W} are presented for the first time. These calibrations agree well with previous works based on Spitzer data within the scatters, and should be applicable to dust-rich galaxies.Comment: PASJ, in pres

arXiv.org e-Print Archive

Crossref

HAL AMU

HAL-INSU

Mass spectrometry of the white adipose metabolome in a hibernating mammal reveals seasonal changes in alternate fuels and carnitine derivatives

Author: Alvarez Sophie
Andrews Matthew T.
Heinis Frazer I.
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 28/06/2023
Field of study

Mammalian hibernators undergo substantial changes in metabolic function throughout the seasonal hibernation cycle. We report here the polar metabolomic profile of white adipose tissue isolated from active and hibernating thirteen-lined ground squirrels (Ictidomys tridecemlineatus). Polar compounds in white adipose tissue were extracted from five groups representing different timepoints throughout the seasonal activity-torpor cycle and analyzed using hydrophilic interaction liquid chromatography-mass spectrometry in both the positive and negative ion modes. A total of 224 compounds out of 660 features detected after curation were annotated. Unsupervised clustering using principal component analysis revealed discrete clusters representing the different seasonal timepoints throughout hibernation. One-way analysis of variance and feature intensity heatmaps revealed metabolites that varied in abundance between active and torpid timepoints. Pathway analysis compared against the KEGG database demonstrated enrichment of amino acid metabolism, purine metabolism, glycerophospholipid metabolism, and coenzyme A biosynthetic pathways among our identified compounds. Numerous carnitine derivatives and a ketone that serves as an alternate fuel source, betahydroxybutyrate (BHB), were among molecules found to be elevated during torpor. Elevated levels of the BHB-carnitine conjugate during torpor suggests the synthesis of beta-hydroxybutyrate in white adipose mitochondria, which may contribute directly to elevated levels of circulating BHB during hibernation

DigitalCommons@University of Nebraska

Mass spectrometry of the white adipose metabolome in a hibernating mammal reveals seasonal changes in alternate fuels and carnitine derivatives

Author: Alvarez Sophie
Andrews Matthew T.
Heinis Frazer I.
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 28/06/2023
Field of study

DigitalCommons@University of Nebraska

The spectral energy distribution of galaxies at z > 2.5: Implications from the Herschel/SPIRE color-color diagram

Author: Buat V.
Burgarella D.
Ciesla L.
Heinis S.
Hou J. -L.
Shao Z.
Shen S.
Yuan F. -T.
Publication venue: 'EDP Sciences'
Publication date: 24/06/2015
Field of study

We use the Herschel SPIRE color-color diagram to study the spectral energy distribution (SED) and the redshift estimation of high-z galaxies. We compiled a sample of 57 galaxies with spectroscopically confirmed redshifts and SPIRE detections in all three bands at

z=2.5-6.4

, and compared their average SPIRE colors with SED templates from local and high-z libraries. We find that local SEDs are inconsistent with high-z observations. The local calibrations of the parameters need to be adjusted to describe the average colors of high-z galaxies. For high-z libraries, the templates with an evolution from z=0 to 3 can well describe the average colors of the observations at high redshift. Using these templates, we defined color cuts to divide the SPIRE color-color diagram into different regions with different mean redshifts. We tested this method and two other color cut methods using a large sample of 783 Herschel-selected galaxies, and find that although these methods can separate the sample into populations with different mean redshifts, the dispersion of redshifts in each population is considerably large. Additional information is needed for better sampling.Comment: 17 pages, 14 figures, accepted for publication in A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

HAL AMU

HAL-INSU

TRANSFORMERS: Robust spatial joins on non-uniform data distributions

Author: Ailamaki A
Heinis T
Karras P
Pavlovic M
Tauheed F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/12/2015
Field of study

Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, each of them has a strength for a particular type of density ratio among the joined datasets. More generally, no single proposed method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have locally divergent data distributions. In this paper we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. It employs a join method based on data-oriented partitioning when joining areas of substantially different local densities, whereas it uses big partitions (as in space-oriented partitioning) when the densities are similar, while seamlessly switching among these two strategies at runtime. We experimentally demonstrate that TRANSFORMERS outperforms state-of-the-art approaches by a factor of between 2 and 8

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Spiral - Imperial College Digital Repository

Space odyssey: efficient exploration of scientific data.

Author: Ailamaki A
Heinis T
Pavlovic M
Sidlauskas D
Zacharatou ET
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2016
Field of study

Advances in data acquisition---through more powerful supercomputers for simulation or sensors with better resolution---help scientists tremendously to understand natural phenomena. At the same time, however, it leaves them with a plethora of data and the challenge of analysing it. Ingesting all the data in a database or indexing it for an efficient analysis is unlikely to pay off because scientists rarely need to analyse all data. Not knowing a priori what parts of the datasets need to be analysed makes the problem challenging. Tools and methods to analyse only subsets of this data are rather rare. In this paper we therefore present Space Odyssey, a novel approach enabling scientists to efficiently explore multiple spatial datasets of massive size. Without any prior information, Space Odyssey incrementally indexes the datasets and optimizes the access to datasets frequently queried together. As our experiments show, through incrementally indexing and changing the data layout on disk, Space Odyssey accelerates exploratory analysis of spatial data by substantially reducing query-to-insight time compared to the state of the art

Infoscience - École polytechnique fédérale de Lausanne

Spiral - Imperial College Digital Repository

Data Infrastructure for Medical Research

Author: Ailamaki A
Heinis T
Publication venue: 'Now Publishers'
Publication date: 01/01/2017
Field of study

While we are witnessing rapid growth in data across the sciences and in many applications, this growth is particularly remarkable in the medical domain, be it because of higher resolution instruments and diagnostic tools (e.g. MRI), new sources of structured data like activity trackers, the wide-spread use of electronic health records and many others. The sheer volume of the data is not, however, the only challenge to be faced when using medical data for research. Other crucial challenges include data heterogeneity, data quality, data privacy and so on. In this article, we review solutions addressing these challenges by discussing the current state of the art in the areas of data integration, data cleaning, data privacy, scalable data access and processing in the context of medical data. The techniques and tools we present will give practitioners — computer scientists and medical researchers alike — a starting point to understand the challenges and solutions and ultimately to analyse medical data and gain better and quicker insights

Spiral - Imperial College Digital Repository

CERN Document Server