Search CORE

1,218 research outputs found

On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences

Author: Ham DA
Heinis T
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2015
Field of study

As a consequence of ever more powerful computing hardware and increasingly precise instruments, our capacity to produce scientific data by far outpaces our ability to efficiently store and analyse it. Few of today's tools to analyse scientific data are able to handle the deluge captured by instruments or generated by supercomputers. In many scenarios, however, it suffices to analyse a small subset of the data in detail. What scientists analysing the data consequently need are efficient means to explore the full dataset using approximate query results and to identify the subsets of interest. Once found, interesting areas can still be scrutinised using a precise, but also more time-consuming analysis. Data synopses fit the bill as they provide fast (but approximate) query execution on massive amounts of data. Generating data synopses after the data is stored, however, requires us to analyse all the data again, and is thus inefficient What we propose is to generate the synopsis for simulation applications on-the-fly when the data is captured. Doing so typically means changing the simulation or data capturing code and is tedious and typically just a one-off solution that is not generally applicable. In contrast, our vision gives scientists a high-level language and the infrastructure needed to generate code that creates data synopses on-the-fly, as the simulation runs. In this paper we discuss the data management challenges associated with our approach</jats:p

Crossref

Spiral - Imperial College Digital Repository

QUASII: QUery-Aware Spatial Incremental Index.

Author: Ailamaki A
Heinis T
Pavlovic M
Sidlauskas D
Publication venue: OpenProceedings.org
Publication date: 26/03/2018
Field of study

With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes

Infoscience - École polytechnique fédérale de Lausanne

Spiral - Imperial College Digital Repository

Probabilistic Cross-Identification of Cosmic Events

Author: Abbott
Budavári
Dahlen
Drake
Heinis
Luo
Tamás Budavári
Publication venue: 'IOP Publishing'
Publication date: 23/05/2011
Field of study

We discuss a novel approach to identifying cosmic events in separate and independent observations. In our focus are the true events, such as supernova explosions, that happen once, hence, whose measurements are not repeatable. Their classification and analysis have to make the best use of all the available data. Bayesian hypothesis testing is used to associate streams of events in space and time. Probabilities are assigned to the matches by studying their rates of occurrence. A case study of Type Ia supernovae illustrates how to use lightcurves in the cross-identification process. Constraints from realistic lightcurves happen to be well-approximated by Gaussians in time, which makes the matching process very efficient. Model-dependent associations are computationally more demanding but can further boost our confidence.Comment: 5 pages, 2 figures, accepted to Ap

arXiv.org e-Print Archive

Crossref

Phage selection of cyclic peptide antagonists with increased stability toward intestinal proteases

Author: Baeriswyl Vanessa
Heinis Christian
Publication venue
Publication date: 02/08/2017
Field of study

The oral delivery of protein and peptide drugs is limited by their proteolytic degradation and the poor absorption across the intestinal epithelia. In this work, we exposed a phage library of small bicyclic peptides (<1.5 kDa) to a pancreatic extract of proteases prior to affinity selection to enrich binders with higher stability in the intestinal environment. Panning with the therapeutic target plasma kallikrein yielded potent inhibitors (Kis between 5.6 and 336 nM) wherein bicyclic peptides isolated with proteolytic pressure were more stable. A proline residue found in a specific position of several resistant bicyclic peptides proved to be a ‘protective mark', rendering the bicyclic peptides resistant to significantly higher concentrations of intestinal proteases while retaining essentially their inhibitory activit

RERO DOC Digital Library

Composition minérale des fourrages consommés par les ruminants domestiques

Author: Cisse Maïmouna
Guérin Hubert
Heinis V.
Publication venue: 'Acta Dermato-Venereologica'
Publication date: 01/01/1989
Field of study

Agritrop

Cross-Identification Performance from Simulated Detections: GALEX and SDSS

Author: Alexander S. Szalay
Budavári
Connolly
Davis
Martin
Morrissey
Pons-Bordería
Scoville
Stoughton
Sébastien Heinis
Tamás Budavári
York
Publication venue: 'IOP Publishing'
Publication date: 16/10/2009
Field of study

We investigate the quality of associations of astronomical sources from multi-wavelength observations using simulated detections that are realistic in terms of their astrometric accuracy, small-scale clustering properties and selection functions. We present a general method to build such mock catalogs for studying associations, and compare the statistics of cross-identifications based on angular separation and Bayesian probability criteria. In particular, we focus on the highly relevant problem of cross-correlating the ultraviolet Galaxy Evolution Explorer (GALEX) and optical Sloan Digital Sky Survey (SDSS) surveys. Using refined simulations of the relevant catalogs, we find that the probability thresholds yield lower contamination of false associations, and are more efficient than angular separation. Our study presents a set of recommended criteria to construct reliable cross-match catalogs between SDSS and GALEX with minimal artifacts.Comment: 7 pages, 9 figures; ApJ in pres

arXiv.org e-Print Archive

Crossref

The spectral energy distribution of galaxies at z > 2.5: Implications from the Herschel/SPIRE color-color diagram

Author: Buat V.
Burgarella D.
Ciesla L.
Heinis S.
Hou J. -L.
Shao Z.
Shen S.
Yuan F. -T.
Publication venue: 'EDP Sciences'
Publication date: 24/06/2015
Field of study

We use the Herschel SPIRE color-color diagram to study the spectral energy distribution (SED) and the redshift estimation of high-z galaxies. We compiled a sample of 57 galaxies with spectroscopically confirmed redshifts and SPIRE detections in all three bands at

z=2.5-6.4

, and compared their average SPIRE colors with SED templates from local and high-z libraries. We find that local SEDs are inconsistent with high-z observations. The local calibrations of the parameters need to be adjusted to describe the average colors of high-z galaxies. For high-z libraries, the templates with an evolution from z=0 to 3 can well describe the average colors of the observations at high redshift. Using these templates, we defined color cuts to divide the SPIRE color-color diagram into different regions with different mean redshifts. We tested this method and two other color cut methods using a large sample of 783 Herschel-selected galaxies, and find that although these methods can separate the sample into populations with different mean redshifts, the dispersion of redshifts in each population is considerably large. Additional information is needed for better sampling.Comment: 17 pages, 14 figures, accepted for publication in A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

HAL AMU

HAL-INSU

HAL: Hyper Article en Ligne