Search CORE

1,547 research outputs found

Making Queries Tractable on Big Data with Preprocessing

Author: Fan Wenfei
Geerts Floris
Neven Frank
Publication venue
Publication date: 01/01/2013
Field of study

A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME al-gorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to pro-vide a formal foundation for this approach in terms of com-putational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠT0Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natu-ral query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be ef-fectively converted to Π-tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠT0Q ⊂ P unless P = NC, i.e., the set ΠT0Q of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper re-factorizations. This work is a step towards understanding the tractability of queries in the context of big data. 1

CiteSeerX

Edinburgh Research Explorer

Entropy-scaling search of massive biological data

Author: Berger Bonnie
Daniels Noah M.
Danko David Christian
Yu Y. William
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

arXiv.org e-Print Archive

Elsevier - Publisher Connector

DSpace@MIT

PubMed Central

Querying Big Data: Bridging Theory and Practice

Author: AY Halevy
CP Kruskal
D Suciu
F Chiang
G Ramalingam
J Cho
J Dean
J Hartmanis
Jin-Peng Huai
JM Hellerstein
L Golab
L Kaufman
LD Nardo
M Armbrust
M Yakout
ND Jones
NN Dalvi
R Fagin
S Ma
T Deng
T Redman
W Fan
W Fan
W Fan
W Fan
W Fan
W Fan
W Fan
W Fan
W Fan
Wenfei Fan
Y Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Edinburgh Research Explorer

Optimized Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts” -- Marie Sklodowska-Curie Project: Open Problems’

Author: Chelly Dagdia Zaineb
Publication venue
Publication date: 12/02/2018
Field of study

Peer reviewe

Aberystwyth Research Portal

Querying Big Social Data

Author: Fan Wenfei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Edinburgh Research Explorer

A Dichotomy for Succinct Representations of Homomorphisms

Author: Berkholz Christoph
Vinall-Smeeth Harry
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

Author: Brandt Sebastian
Horrocks Ian
Ioannidis Yannis
Kharlamov Evgeny
Kotidis Yannis
Lamparter Steffen
Mailis Theofilos
Möller Ralf
Neuenstadt Christian
Nikolaou Charalampos
Svingos Christoforos
Zheleznyakov Dmitriy
Özcep Özgür
Publication venue
Publication date: 01/01/2016
Field of study

Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data

arXiv.org e-Print Archive

Oxford University Research Archive

Parameter Compilation

Author: Chen Hubie
Publication venue
Publication date: 01/01/2015
Field of study

In resolving instances of a computational problem, if multiple instances of interest share a feature in common, it may be fruitful to compile this feature into a format that allows for more efficient resolution, even if the compilation is relatively expensive. In this article, we introduce a formal framework for classifying problems according to their compilability. The basic object in our framework is that of a parameterized problem, which here is a language along with a parameterization---a map which provides, for each instance, a so-called parameter on which compilation may be performed. Our framework is positioned within the paradigm of parameterized complexity, and our notions are relatable to established concepts in the theory of parameterized complexity. Indeed, we view our framework as playing a unifying role, integrating together parameterized complexity and compilability theory

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server