Search CORE

1,711 research outputs found

Inferring individual attributes from search engine queries and auxiliary information

Author: Burl M. C.
Goel S.
Kaufman L.
King G.
Paul M. J.
Quadrianto N.
US Cancer Statistics Working Group
Yom-Tov E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/10/2016
Field of study

Internet data has surfaced as a primary source for investigation of different aspects of human behavior. A crucial step in such studies is finding a suitable cohort (i.e., a set of users) that shares a common trait of interest to researchers. However, direct identification of users sharing this trait is often impossible, as the data available to researchers is usually anonymized to preserve user privacy. To facilitate research on specific topics of interest, especially in medicine, we introduce an algorithm for identifying a trait of interest in anonymous users. We illustrate how a small set of labeled examples, together with statistical information about the entire population, can be aggregated to obtain labels on unseen examples. We validate our approach using labeled data from the political domain. We provide two applications of the proposed algorithm to the medical domain. In the first, we demonstrate how to identify users whose search patterns indicate they might be suffering from certain types of cancer. In the second, we detail an algorithm to predict the distribution of diseases given their incidence in a subset of the population at study, making it possible to predict disease spread from partial epidemiological data

arXiv.org e-Print Archive

Crossref

Building Efficient Query Engines in a High-Level Language

Author: Klonatos Yannis
Koch Christoph
Shaikhha Amir
Publication venue
Publication date: 16/12/2016
Field of study

Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Dataset Discovery and Exploration: A Survey

Author: Chen Jiaoyan
Paton Norman
Wu Zhenyu
Publication venue
Publication date: 04/10/2023
Field of study

The University of Manchester - Institutional Repository

Recommended from our members

Guide Me in Analysis: A Framework for Guidance Designers

Author: Andrienko G.
Andrienko N.
Ceneda D.
Gschwandtner T.
Miksch S.
Piccolotto N.
Schreck T.
Streit M.
Suschnigg J.
Tominski C.
Publication venue: 'Wiley'
Publication date: 30/09/2020
Field of study

Guidance is an emerging topic in the field of visual analytics. Guidance can support users in pursuing their analytical goals more efficiently and help in making the analysis successful. However, it is not clear how guidance approaches should be designed and what specific factors should be considered for effective support. In this paper, we approach this problem from the perspective of guidance designers. We present a framework comprising requirements and a set of specific phases designers should go through when designing guidance for visual analytics. We relate this process with a set of quality criteria we aim to support with our framework, that are necessary for obtaining a suitable and effective guidance solution. To demonstrate the practical usability of our methodology, we apply our framework to the design of guidance in three analysis scenarios and a design walk-through session. Moreover, we list the emerging challenges and report how the framework can be used to design guidance solutions that mitigate these issues

City Research Online

Crossref

Transferring a Semantic Representation for Person Re-Identification and Search

Author: Hospedales TM
IEEE
Shi Z
Xiang T
Publication venue
Publication date: 08/12/2015
Field of study

Queen Mary Research Online