12,692 research outputs found
Data Provenance and Management in Radio Astronomy: A Stream Computing Approach
New approaches for data provenance and data management (DPDM) are required
for mega science projects like the Square Kilometer Array, characterized by
extremely large data volume and intense data rates, therefore demanding
innovative and highly efficient computational paradigms. In this context, we
explore a stream-computing approach with the emphasis on the use of
accelerators. In particular, we make use of a new generation of high
performance stream-based parallelization middleware known as InfoSphere
Streams. Its viability for managing and ensuring interoperability and integrity
of signal processing data pipelines is demonstrated in radio astronomy. IBM
InfoSphere Streams embraces the stream-computing paradigm. It is a shift from
conventional data mining techniques (involving analysis of existing data from
databases) towards real-time analytic processing. We discuss using InfoSphere
Streams for effective DPDM in radio astronomy and propose a way in which
InfoSphere Streams can be utilized for large antennae arrays. We present a
case-study: the InfoSphere Streams implementation of an autocorrelating
spectrometer, and using this example we discuss the advantages of the
stream-computing approach and the utilization of hardware accelerators
Database Learning: Toward a Database that Becomes Smarter Every Time
In today's databases, previous query answers rarely benefit answering future
queries. For the first time, to the best of our knowledge, we change this
paradigm in an approximate query processing (AQP) context. We make the
following observation: the answer to each query reveals some degree of
knowledge about the answer to another query because their answers stem from the
same underlying distribution that has produced the entire dataset. Exploiting
and refining this knowledge should allow us to answer queries more
analytically, rather than by reading enormous amounts of raw data. Also,
processing more queries should continuously enhance our knowledge of the
underlying distribution, and hence lead to increasingly faster response times
for future queries.
We call this novel idea---learning from past query answers---Database
Learning. We exploit the principle of maximum entropy to produce answers, which
are in expectation guaranteed to be more accurate than existing sample-based
approximations. Empowered by this idea, we build a query engine on top of Spark
SQL, called Verdict. We conduct extensive experiments on real-world query
traces from a large customer of a major database vendor. Our results
demonstrate that Verdict supports 73.7% of these queries, speeding them up by
up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM
SIGMOD conference 201
Fast and Simple Relational Processing of Uncertain Data
This paper introduces U-relations, a succinct and purely relational
representation system for uncertain databases. U-relations support
attribute-level uncertainty using vertical partitioning. If we consider
positive relational algebra extended by an operation for computing possible
answers, a query on the logical level can be translated into, and evaluated as,
a single relational algebra query on the U-relation representation. The
translation scheme essentially preserves the size of the query in terms of
number of operations and, in particular, number of joins. Standard techniques
employed in off-the-shelf relational database management systems are effective
for optimizing and processing queries on U-relations. In our experiments we
show that query evaluation on U-relations scales to large amounts of data with
high degrees of uncertainty.Comment: 12 pages, 14 figure
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
A-twisted correlators and Hori dualities
The Hori-Tong and Hori dualities are infrared dualities between
two-dimensional gauge theories with supersymmetry, which
are reminiscent of four-dimensional Seiberg dualities. We provide additional
evidence for those dualities with , , and
gauge groups, by matching correlation functions of Coulomb branch operators on
a Riemann surface , in the presence of the topological -twist. The
theories studied, denoted by and , can be understood
as orbifolds of an theory. The correlators of these
theories on with are obtained by computing correlators with
-twisted boundary conditions and summing them up with weights
determined by the orbifold projection.Comment: 45 pages plus appendix; v2: updated bibliography and acknowledgement
- …