1,711 research outputs found
Inferring individual attributes from search engine queries and auxiliary information
Internet data has surfaced as a primary source for investigation of different
aspects of human behavior. A crucial step in such studies is finding a suitable
cohort (i.e., a set of users) that shares a common trait of interest to
researchers. However, direct identification of users sharing this trait is
often impossible, as the data available to researchers is usually anonymized to
preserve user privacy. To facilitate research on specific topics of interest,
especially in medicine, we introduce an algorithm for identifying a trait of
interest in anonymous users. We illustrate how a small set of labeled examples,
together with statistical information about the entire population, can be
aggregated to obtain labels on unseen examples. We validate our approach using
labeled data from the political domain.
We provide two applications of the proposed algorithm to the medical domain.
In the first, we demonstrate how to identify users whose search patterns
indicate they might be suffering from certain types of cancer. In the second,
we detail an algorithm to predict the distribution of diseases given their
incidence in a subset of the population at study, making it possible to predict
disease spread from partial epidemiological data
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Recommended from our members
Guide Me in Analysis: A Framework for Guidance Designers
Guidance is an emerging topic in the field of visual analytics. Guidance can support users in pursuing their analytical goals more efficiently and help in making the analysis successful. However, it is not clear how guidance approaches should be designed and what specific factors should be considered for effective support. In this paper, we approach this problem from the perspective of guidance designers. We present a framework comprising requirements and a set of specific phases designers should go through when designing guidance for visual analytics. We relate this process with a set of quality criteria we aim to support with our framework, that are necessary for obtaining a suitable and effective guidance solution. To demonstrate the practical usability of our methodology, we apply our framework to the design of guidance in three analysis scenarios and a design walk-through session. Moreover, we list the emerging challenges and report how the framework can be used to design guidance solutions that mitigate these issues
- …