Search CORE

2,916 research outputs found

Contributions à l’Optimisation de Requêtes Multidimensionnelles

Author: Maabout Sofian
Publication venue: HAL CCSD
Publication date: 12/12/2014
Field of study

Analyser les données consiste à choisir un sous-ensemble des dimensions qui les décriventafin d'en extraire des informations utiles. Or, il est rare que l'on connaisse a priori les dimensions"intéressantes". L'analyse se transforme alors en une activité exploratoire où chaque passe traduit par une requête. Ainsi, il devient primordiale de proposer des solutions d'optimisationde requêtes qui ont une vision globale du processus plutôt que de chercher à optimiser chaque requêteindépendamment les unes des autres. Nous présentons nos contributions dans le cadre de cette approcheexploratoire en nous focalisant sur trois types de requêtes: (i) le calcul de bordures,(ii) les requêtes dites OLAP (On Line Analytical Processing) dans les cubes de données et (iii) les requêtesde préférence type skyline

Thèses en Ligne

Efficient subspace skyline query based on user preference using MapReduce

Author: DONG Mianxiong
JI Changqing
LI Yuanyuan
LI Zhiyang
QU Wenyu
WU Junfeng
Publication venue: 'Elsevier BV'
Publication date: 01/12/2015
Field of study

Subspace skyline, as an important variant of skyline, has been widely applied for multiple-criteria decisions, business planning. With the development of mobile internet, subspace skyline query in mobile distributed environments has recently attracted considerable attention. However, efficiently obtaining the meaningful subset of skyline points in any subspace remains a challenging task in the current mobile internet. For more and more mobile applications, subspace skyline query on mobile units is usually limited by big data and wireless bandwidth. To address this issue, in this paper, we propose a system model that can support subspace skyline query in mobile distributed environment. An efficient algorithm for processing the Subspace Skyline Query using MapReduce (SSQ) is also presented which can obtain the meaningful subset of points from the full set of skyline points in any subspace. The SSQ algorithm divides a subspace skyline query into two processing phases: the preprocess phase and the query phase. The preprocess phase includes the pruning process and constructing index process which is designed to reduce network delay and response time. Additionally, the query phase provides two filtering methods, SQM-filtering and ε-filtering, to filter the skyline points according to user preference and reduce network cost. Extensive experiments on real and synthetic data are conducted and the experimental results indicate that our algorithm is much efficient, meanwhile, the pruning strategy can further improve the efficiency of the algorithm

Muroran-IT Academic Resource Archive

Recommended from our members

Complex Query Operators on Modern Parallel Architectures

Author: Zois Vasileios
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators

eScholarship - University of California

SkyLens: Visual analysis of skyline on multi-dimensional data

Author: CHEN Yuan
CUI Weiwei
DU Xinnan
LEE Dik Lun
QU Huamin
WANG Yong
WU Yanhong
ZHAO Xun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Skyline queries have wide-ranging applications in fields that involve multi-criteria decision making, including tourism, retail industry, and human resources. By automatically removing incompetent candidates, skyline queries allow users to focus on a subset of superior data items (i.e., the skyline), thus reducing the decision-making overhead. However, users are still required to interpret and compare these superior items manually before making a successful choice. This task is challenging because of two issues. First, people usually have fuzzy, unstable, and inconsistent preferences when presented with multiple candidates. Second, skyline queries do not reveal the reasons for the superiority of certain skyline points in a multi-dimensional space. To address these issues, we propose SkyLens, a visual analytic system aiming at revealing the superiority of skyline points from different perspectives and at different scales to aid users in their decision making. Two scenarios demonstrate the usefulness of SkyLens on two datasets with a dozen of attributes. A qualitative study is also conducted to show that users can efficiently accomplish skyline understanding and comparison tasks with SkyLens.Comment: 10 pages. Accepted for publication at IEEE VIS 2017 (in proceedings of VAST

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Detection and classification of changes in evolving data streams

Author: Gaber M.
Yu P.
Publication venue
Publication date: 01/01/2006
Field of study

Portsmouth University Research Portal (Pure)

On Obtaining Stable Rankings

Author: Asudeh Abolfazl
Jagadish H. V.
Miklau Gerome
Stoyanovich Julia
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2018
Field of study

Decision making is challenging when there is more than one criterion to consider. In such cases, it is common to assign a goodness score to each item as a weighted sum of its attribute values and rank them accordingly. Clearly, the ranking obtained depends on the weights used for this summation. Ideally, one would want the ranked order not to change if the weights are changed slightly. We call this property {\em stability} of the ranking. A consumer of a ranked list may trust the ranking more if it has high stability. A producer of a ranked list prefers to choose weights that result in a stable ranking, both to earn the trust of potential consumers and because a stable ranking is intrinsically likely to be more meaningful. In this paper, we develop a framework that can be used to assess the stability of a provided ranking and to obtain a stable ranking within an "acceptable" range of weight values (called "the region of interest"). We address the case where the user cares about the rank order of the entire set of items, and also the case where the user cares only about the top-

k

items. Using a geometric interpretation, we propose algorithms that produce stable rankings. In addition to theoretical analyses, we conduct extensive experiments on real datasets that validate our proposal

arXiv.org e-Print Archive

FigShare

University of Illinois at Chicago: UIC INDIGO (INtellectual property in DIGital form available online in an Open environment)

Supporting case-based retrieval by similarity skylines: Basic concepts and extensions

Author: Hüllermeier Eyke
Prados Suárez María Belén
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Conventional approaches to similarity search and case-based retrieval, such as nearest neighbor search, require the speci cation of a global similarity measure which is typically expressed as an aggregation of local measures pertaining to di erent aspects of a case. Since the proper aggregation of local measures is often quite di cult, we propose a novel concept called similarity skyline. Roughly speaking, the similarity skyline of a case base is de ned by the subset of cases that are most similar to a given query in a Pareto sense. Thus, the idea is to proceed from a d-dimensional comparison between cases in terms of d (local) distance measures and to identify those cases that are maximally similar in the sense of the Pareto dominance relation [2]. To re ne the retrieval result, we propose a method for computing maximally diverse subsets of a similarity skyline. Moreover, we propose a generalization of similarity skylines which is able to deal with uncertain data described in terms of interval or fuzzy attribute values. The method is applied to similarity search over uncertain archaeological data

Repositorio Institucional Universidad de Granada