561 research outputs found
Structurally Tractable Uncertain Data
Many data management applications must deal with data which is uncertain,
incomplete, or noisy. However, on existing uncertain data representations, we
cannot tractably perform the important query evaluation tasks of determining
query possibility, certainty, or probability: these problems are hard on
arbitrary uncertain input instances. We thus ask whether we could restrict the
structure of uncertain data so as to guarantee the tractability of exact query
evaluation. We present our tractability results for tree and tree-like
uncertain data, and a vision for probabilistic rule reasoning. We also study
uncertainty about order, proposing a suitable representation, and study
uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium
201
From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back
In this work we establish and investigate connections between causes for
query answers in databases, database repairs wrt. denial constraints, and
consistency-based diagnosis. The first two are relatively new research areas in
databases, and the third one is an established subject in knowledge
representation. We show how to obtain database repairs from causes, and the
other way around. Causality problems are formulated as diagnosis problems, and
the diagnoses provide causes and their responsibilities. The vast body of
research on database repairs can be applied to the newer problems of computing
actual causes for query answers and their responsibilities. These connections,
which are interesting per se, allow us, after a transition -inspired by
consistency-based diagnosis- to computational problems on hitting sets and
vertex covers in hypergraphs, to obtain several new algorithmic and complexity
results for database causality.Comment: To appear in Theory of Computing Systems. By invitation to special
issue with extended papers from ICDT 2015 (paper arXiv:1412.4311
Computationally Assisted Quality Control for Public Health Data Streams
Irregularities in public health data streams (like COVID-19 Cases) hamper
data-driven decision-making for public health stakeholders. A real-time,
computer-generated list of the most important, outlying data points from
thousands of daily-updated public health data streams could assist an expert
reviewer in identifying these irregularities. However, existing outlier
detection frameworks perform poorly on this task because they do not account
for the data volume or for the statistical properties of public health streams.
Accordingly, we developed FlaSH (Flagging Streams in public Health), a
practical outlier detection framework for public health data users that uses
simple, scalable models to capture these statistical properties explicitly. In
an experiment where human experts evaluate FlaSH and existing methods
(including deep learning approaches), FlaSH scales to the data volume of this
task, matches or exceeds these other methods in mean accuracy, and identifies
the outlier points that users empirically rate as more helpful. Based on these
results, FlaSH has been deployed on data streams used by public health
stakeholders.Comment: https://github.com/cmu-delphi/covidcast-indicators/tree/main/_delphi_utils_python/delphi_utils/flash_eva
The Development of Vital Precincts in Doha: Urban Regeneration and Socio-Cultural Factors
Through the past few decades, Doha, the capital of the State of Qatar, has experienced an extraordinary economic growth and transformation of its built environment. This has been caused by post-WWII oil and natural gas production, which has transformed the economy of Qatar from fishing and pearling based to a differentiated economy. The State of Qatar is currently investing large funds into the transformation of Doha’s built environment and the development of new major urban public transit networks (i.e. the Doha Metro, the Lusail light rail transit (LRT) and a bus rapid transit (BRT)). Authorities are committed to have the new transport systems operational before the 2022 FIFA World cup competition. This paper discusses the key factors and/or challenges to be studied and considered for integrating Doha metro transport system with land use. Namely it is argued that the key factors for the design and planning of successful, functional and economically vital precincts developed in the proximity of the new Doha Metro stations are related to tangible or financial-economic aspects, as well as intangible or socio-cultural aspects
Optimal column layout for hybrid workloads
Data-intensive analytical applications need to support both efficient reads and writes. However, what is usually a good data layout for an update-heavy workload, is not well-suited for a read-mostly one and vice versa. Modern analytical data systems rely on columnar layouts and employ delta stores to inject new data and updates. We show that for hybrid workloads we can achieve close to one order of magnitude better performance by tailoring the column layout design to the data and query workload. Our approach navigates the possible design space of the physical layout: it organizes each column’s data by determining the number of partitions, their corresponding sizes and ranges, and the amount of buffer space and how it is allocated. We frame these design decisions as an optimization problem that, given workload knowledge and performance requirements, provides an optimal physical layout for the workload at hand. To evaluate this work, we build an in-memory storage engine, Casper, and we show that it outperforms state-of-the-art data layouts of analytical systems for hybrid workloads. Casper delivers up to 2.32x higher throughput for update-intensive workloads and up to 2.14x higher throughput for hybrid workloads. We further show how to make data layout decisions robust to workload variation by carefully selecting the input of the optimization.http://www.vldb.org/pvldb/vol12/p2393-athanassoulis.pdfPublished versionPublished versio
Ptolemaic Indexing
This paper discusses a new family of bounds for use in similarity search,
related to those used in metric indexing, but based on Ptolemy's inequality,
rather than the metric axioms. Ptolemy's inequality holds for the well-known
Euclidean distance, but is also shown here to hold for quadratic form metrics
in general, with Mahalanobis distance as an important special case. The
inequality is examined empirically on both synthetic and real-world data sets
and is also found to hold approximately, with a very low degree of error, for
important distances such as the angular pseudometric and several Lp norms.
Indexing experiments demonstrate a highly increased filtering power compared to
existing, triangular methods. It is also shown that combining the Ptolemaic and
triangular filtering can lead to better results than using either approach on
its own
The Urban Regeneration of West Bay, Business District of Doha (State of Qatar)
The State of Qatar is facing the construction of an advanced public railway transport system. However, researchers argue that the integration of transit stations in existing urban villages can led to a decline in quality of life and cause a loss of local culture and identity in the built environment. The aim of this research study is to investigate the impact of the transit station of West Bay, business district of Doha, on the quality of life and/or liveability of the inhabitants. The findings will contribute to determine urban design strategies for enhancing quality of life of the district
- …