1,218 research outputs found
On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences
As a consequence of ever more powerful computing hardware and increasingly precise instruments, our capacity to produce scientific data by far outpaces our ability to efficiently store and analyse it. Few of today's tools to analyse scientific data are able to handle the deluge captured by instruments or generated by supercomputers.
In many scenarios, however, it suffices to analyse a small subset of the data in detail. What scientists analysing the data consequently need are efficient means to explore the full dataset using approximate query results and to identify the subsets of interest. Once found, interesting areas can still be scrutinised using a precise, but also more time-consuming analysis. Data synopses fit the bill as they provide fast (but approximate) query execution on massive amounts of data. Generating data synopses after the data is stored, however, requires us to analyse all the data again, and is thus inefficient
What we propose is to generate the synopsis for simulation applications on-the-fly when the data is captured. Doing so typically means changing the simulation or data capturing code and is tedious and typically just a one-off solution that is not generally applicable. In contrast, our vision gives scientists a high-level language and the infrastructure needed to generate code that creates data synopses on-the-fly, as the simulation runs. In this paper we discuss the data management challenges associated with our approach</jats:p
QUASII: QUery-Aware Spatial Incremental Index.
With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes
Probabilistic Cross-Identification of Cosmic Events
We discuss a novel approach to identifying cosmic events in separate and
independent observations. In our focus are the true events, such as supernova
explosions, that happen once, hence, whose measurements are not repeatable.
Their classification and analysis have to make the best use of all the
available data. Bayesian hypothesis testing is used to associate streams of
events in space and time. Probabilities are assigned to the matches by studying
their rates of occurrence. A case study of Type Ia supernovae illustrates how
to use lightcurves in the cross-identification process. Constraints from
realistic lightcurves happen to be well-approximated by Gaussians in time,
which makes the matching process very efficient. Model-dependent associations
are computationally more demanding but can further boost our confidence.Comment: 5 pages, 2 figures, accepted to Ap
Phage selection of cyclic peptide antagonists with increased stability toward intestinal proteases
The oral delivery of protein and peptide drugs is limited by their proteolytic degradation and the poor absorption across the intestinal epithelia. In this work, we exposed a phage library of small bicyclic peptides (<1.5 kDa) to a pancreatic extract of proteases prior to affinity selection to enrich binders with higher stability in the intestinal environment. Panning with the therapeutic target plasma kallikrein yielded potent inhibitors (Kis between 5.6 and 336 nM) wherein bicyclic peptides isolated with proteolytic pressure were more stable. A proline residue found in a specific position of several resistant bicyclic peptides proved to be a ‘protective mark', rendering the bicyclic peptides resistant to significantly higher concentrations of intestinal proteases while retaining essentially their inhibitory activit
Cross-Identification Performance from Simulated Detections: GALEX and SDSS
We investigate the quality of associations of astronomical sources from
multi-wavelength observations using simulated detections that are realistic in
terms of their astrometric accuracy, small-scale clustering properties and
selection functions. We present a general method to build such mock catalogs
for studying associations, and compare the statistics of cross-identifications
based on angular separation and Bayesian probability criteria. In particular,
we focus on the highly relevant problem of cross-correlating the ultraviolet
Galaxy Evolution Explorer (GALEX) and optical Sloan Digital Sky Survey (SDSS)
surveys. Using refined simulations of the relevant catalogs, we find that the
probability thresholds yield lower contamination of false associations, and are
more efficient than angular separation. Our study presents a set of recommended
criteria to construct reliable cross-match catalogs between SDSS and GALEX with
minimal artifacts.Comment: 7 pages, 9 figures; ApJ in pres
The spectral energy distribution of galaxies at z > 2.5: Implications from the Herschel/SPIRE color-color diagram
We use the Herschel SPIRE color-color diagram to study the spectral energy
distribution (SED) and the redshift estimation of high-z galaxies. We compiled
a sample of 57 galaxies with spectroscopically confirmed redshifts and SPIRE
detections in all three bands at , and compared their average SPIRE
colors with SED templates from local and high-z libraries. We find that local
SEDs are inconsistent with high-z observations. The local calibrations of the
parameters need to be adjusted to describe the average colors of high-z
galaxies. For high-z libraries, the templates with an evolution from z=0 to 3
can well describe the average colors of the observations at high redshift.
Using these templates, we defined color cuts to divide the SPIRE color-color
diagram into different regions with different mean redshifts. We tested this
method and two other color cut methods using a large sample of 783
Herschel-selected galaxies, and find that although these methods can separate
the sample into populations with different mean redshifts, the dispersion of
redshifts in each population is considerably large. Additional information is
needed for better sampling.Comment: 17 pages, 14 figures, accepted for publication in A&
- …
