210 research outputs found
QUASII: QUery-Aware Spatial Incremental Index.
With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes
Towards batch-processing on cold storage devices
Large amounts of data in storage systems is cold, i.e., Written Once and Read Occasionally (WORO). The rapid growth of massive-scale archival and historical data increases the demand for petabyte-scale cheap storage for such cold data. A Cold Storage Device (CSD) is a disk-based storage system which is designed to trade off performance for cost and power efficiency. Inevitably, the design restrictions used in CSD's results in performance limitations. These limitations are not a concern for WORO workloads, however, the very low price/performance characteristics of CSDs makes them interesting for other applications, e.g., batch processes, too. Applications, however, can be very slow on CSD's if they do not take their characteristics into account. In this paper we design two strategies for data partitioning in CSDs -- a crucial operation in many batch analytics tasks like hash-join, near-duplicate detection, and data localization. We show that our strategies can efficiently use CSDs for batch processing of terabyte-scale data by accelerating data partitioning by 3.5x in our experiments
AKARI/IRC Broadband Mid-infrared data as an indicator of Star Formation Rate
AKARI/Infrared Camera (IRC) Point Source Catalog provides a large amount of
flux data at {\it S9W} () and {\it L18W} ()
bands. With the goal of constructing Star-Formation Rate(SFR) calculations
using IRC data, we analyzed an IR selected
GALEX-SDSS-2MASS-AKARI(IRC/Far-Infrared Surveyor) sample of 153 nearby
galaxies. The far-infrared fluxes were obtained from AKARI diffuse maps to
correct the underestimation for extended sources raised by the point-spread
function photometry. SFRs of these galaxies were derived by the spectral energy
distribution fitting program CIGALE. In spite of complicated features contained
in these bands, both the {\it S9W} and {\it L18W} emission correlate with the
SFR of galaxies. The SFR calibrations using {\it S9W} and {\it L18W} are
presented for the first time. These calibrations agree well with previous works
based on Spitzer data within the scatters, and should be applicable to
dust-rich galaxies.Comment: PASJ, in pres
Mass spectrometry of the white adipose metabolome in a hibernating mammal reveals seasonal changes in alternate fuels and carnitine derivatives
Mammalian hibernators undergo substantial changes in metabolic function throughout the seasonal hibernation cycle. We report here the polar metabolomic profile of white adipose tissue isolated from active and hibernating thirteen-lined ground squirrels (Ictidomys tridecemlineatus). Polar compounds in white adipose tissue were extracted from five groups representing different timepoints throughout the seasonal activity-torpor cycle and analyzed using hydrophilic interaction liquid chromatography-mass spectrometry in both the positive and negative ion modes. A total of 224 compounds out of 660 features detected after curation were annotated. Unsupervised clustering using principal component analysis revealed discrete clusters representing the different seasonal timepoints throughout hibernation. One-way analysis of variance and feature intensity heatmaps revealed metabolites that varied in abundance between active and torpid timepoints. Pathway analysis compared against the KEGG database demonstrated enrichment of amino acid metabolism, purine metabolism, glycerophospholipid metabolism, and coenzyme A biosynthetic pathways among our identified compounds. Numerous carnitine derivatives and a ketone that serves as an alternate fuel source, betahydroxybutyrate (BHB), were among molecules found to be elevated during torpor. Elevated levels of the BHB-carnitine conjugate during torpor suggests the synthesis of beta-hydroxybutyrate in white adipose mitochondria, which may contribute directly to elevated levels of circulating BHB during hibernation
Mass spectrometry of the white adipose metabolome in a hibernating mammal reveals seasonal changes in alternate fuels and carnitine derivatives
Mammalian hibernators undergo substantial changes in metabolic function throughout the seasonal hibernation cycle. We report here the polar metabolomic profile of white adipose tissue isolated from active and hibernating thirteen-lined ground squirrels (Ictidomys tridecemlineatus). Polar compounds in white adipose tissue were extracted from five groups representing different timepoints throughout the seasonal activity-torpor cycle and analyzed using hydrophilic interaction liquid chromatography-mass spectrometry in both the positive and negative ion modes. A total of 224 compounds out of 660 features detected after curation were annotated. Unsupervised clustering using principal component analysis revealed discrete clusters representing the different seasonal timepoints throughout hibernation. One-way analysis of variance and feature intensity heatmaps revealed metabolites that varied in abundance between active and torpid timepoints. Pathway analysis compared against the KEGG database demonstrated enrichment of amino acid metabolism, purine metabolism, glycerophospholipid metabolism, and coenzyme A biosynthetic pathways among our identified compounds. Numerous carnitine derivatives and a ketone that serves as an alternate fuel source, betahydroxybutyrate (BHB), were among molecules found to be elevated during torpor. Elevated levels of the BHB-carnitine conjugate during torpor suggests the synthesis of beta-hydroxybutyrate in white adipose mitochondria, which may contribute directly to elevated levels of circulating BHB during hibernation
The spectral energy distribution of galaxies at z > 2.5: Implications from the Herschel/SPIRE color-color diagram
We use the Herschel SPIRE color-color diagram to study the spectral energy
distribution (SED) and the redshift estimation of high-z galaxies. We compiled
a sample of 57 galaxies with spectroscopically confirmed redshifts and SPIRE
detections in all three bands at , and compared their average SPIRE
colors with SED templates from local and high-z libraries. We find that local
SEDs are inconsistent with high-z observations. The local calibrations of the
parameters need to be adjusted to describe the average colors of high-z
galaxies. For high-z libraries, the templates with an evolution from z=0 to 3
can well describe the average colors of the observations at high redshift.
Using these templates, we defined color cuts to divide the SPIRE color-color
diagram into different regions with different mean redshifts. We tested this
method and two other color cut methods using a large sample of 783
Herschel-selected galaxies, and find that although these methods can separate
the sample into populations with different mean redshifts, the dispersion of
redshifts in each population is considerably large. Additional information is
needed for better sampling.Comment: 17 pages, 14 figures, accepted for publication in A&
TRANSFORMERS: Robust spatial joins on non-uniform data distributions
Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, each of them has a strength for a particular type of density ratio among the joined datasets. More generally, no single proposed method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have locally divergent data distributions. In this paper we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. It employs a join method based on data-oriented partitioning when joining areas of substantially different local densities, whereas it uses big partitions (as in space-oriented partitioning) when the densities are similar, while seamlessly switching among these two strategies at runtime. We experimentally demonstrate that TRANSFORMERS outperforms state-of-the-art approaches by a factor of between 2 and 8
Space odyssey: efficient exploration of scientific data.
Advances in data acquisition---through more powerful supercomputers for simulation or sensors with better resolution---help scientists tremendously to understand natural phenomena. At the same time, however, it leaves them with a plethora of data and the challenge of analysing it. Ingesting all the data in a database or indexing it for an efficient analysis is unlikely to pay off because scientists rarely need to analyse all data. Not knowing a priori what parts of the datasets need to be analysed makes the problem challenging. Tools and methods to analyse only subsets of this data are rather rare. In this paper we therefore present Space Odyssey, a novel approach enabling scientists to efficiently explore multiple spatial datasets of massive size. Without any prior information, Space Odyssey incrementally indexes the datasets and optimizes the access to datasets frequently queried together. As our experiments show, through incrementally indexing and changing the data layout on disk, Space Odyssey accelerates exploratory analysis of spatial data by substantially reducing query-to-insight time compared to the state of the art
Data Infrastructure for Medical Research
While we are witnessing rapid growth in data across the sciences and in many applications, this growth is particularly remarkable in the medical domain, be it because of higher resolution instruments and diagnostic tools (e.g. MRI), new sources of structured data like activity trackers, the wide-spread use of electronic health records and many others. The sheer volume of the data is not, however, the only challenge to be faced when using medical data for research. Other crucial challenges include data heterogeneity, data quality, data privacy and so on. In this article, we review solutions addressing these challenges by discussing the current state of the art in the areas of data integration, data cleaning, data privacy, scalable data access and processing in the context of medical data. The techniques and tools we present will give practitioners — computer scientists and medical researchers alike — a starting point to understand the challenges and solutions and ultimately to analyse medical data and gain better and quicker insights
- …