2,916 research outputs found
Contributions Ă lâOptimisation de RequĂȘtes Multidimensionnelles
Analyser les donnĂ©es consiste Ă choisir un sous-ensemble des dimensions qui les dĂ©criventafin d'en extraire des informations utiles. Or, il est rare que l'on connaisse a priori les dimensions"intĂ©ressantes". L'analyse se transforme alors en une activitĂ© exploratoire oĂč chaque passe traduit par une requĂȘte. Ainsi, il devient primordiale de proposer des solutions d'optimisationde requĂȘtes qui ont une vision globale du processus plutĂŽt que de chercher Ă optimiser chaque requĂȘteindĂ©pendamment les unes des autres. Nous prĂ©sentons nos contributions dans le cadre de cette approcheexploratoire en nous focalisant sur trois types de requĂȘtes: (i) le calcul de bordures,(ii) les requĂȘtes dites OLAP (On Line Analytical Processing) dans les cubes de donnĂ©es et (iii) les requĂȘtesde prĂ©fĂ©rence type skyline
Efficient subspace skyline query based on user preference using MapReduce
Subspace skyline, as an important variant of skyline, has been widely applied for multiple-criteria decisions, business planning. With the development of mobile internet, subspace skyline query in mobile distributed environments has recently attracted considerable attention. However, efficiently obtaining the meaningful subset of skyline points in any subspace remains a challenging task in the current mobile internet. For more and more mobile applications, subspace skyline query on mobile units is usually limited by big data and wireless bandwidth. To address this issue, in this paper, we propose a system model that can support subspace skyline query in mobile distributed environment. An efficient algorithm for processing the Subspace Skyline Query using MapReduce (SSQ) is also presented which can obtain the meaningful subset of points from the full set of skyline points in any subspace. The SSQ algorithm divides a subspace skyline query into two processing phases: the preprocess phase and the query phase. The preprocess phase includes the pruning process and constructing index process which is designed to reduce network delay and response time. Additionally, the query phase provides two filtering methods, SQM-filtering and Δ-filtering, to filter the skyline points according to user preference and reduce network cost. Extensive experiments on real and synthetic data are conducted and the experimental results indicate that our algorithm is much efficient, meanwhile, the pruning strategy can further improve the efficiency of the algorithm
Recommended from our members
Complex Query Operators on Modern Parallel Architectures
Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators
SkyLens: Visual analysis of skyline on multi-dimensional data
Skyline queries have wide-ranging applications in fields that involve
multi-criteria decision making, including tourism, retail industry, and human
resources. By automatically removing incompetent candidates, skyline queries
allow users to focus on a subset of superior data items (i.e., the skyline),
thus reducing the decision-making overhead. However, users are still required
to interpret and compare these superior items manually before making a
successful choice. This task is challenging because of two issues. First,
people usually have fuzzy, unstable, and inconsistent preferences when
presented with multiple candidates. Second, skyline queries do not reveal the
reasons for the superiority of certain skyline points in a multi-dimensional
space. To address these issues, we propose SkyLens, a visual analytic system
aiming at revealing the superiority of skyline points from different
perspectives and at different scales to aid users in their decision making. Two
scenarios demonstrate the usefulness of SkyLens on two datasets with a dozen of
attributes. A qualitative study is also conducted to show that users can
efficiently accomplish skyline understanding and comparison tasks with SkyLens.Comment: 10 pages. Accepted for publication at IEEE VIS 2017 (in proceedings
of VAST
On Obtaining Stable Rankings
Decision making is challenging when there is more than one criterion to
consider. In such cases, it is common to assign a goodness score to each item
as a weighted sum of its attribute values and rank them accordingly. Clearly,
the ranking obtained depends on the weights used for this summation. Ideally,
one would want the ranked order not to change if the weights are changed
slightly. We call this property {\em stability} of the ranking. A consumer of a
ranked list may trust the ranking more if it has high stability. A producer of
a ranked list prefers to choose weights that result in a stable ranking, both
to earn the trust of potential consumers and because a stable ranking is
intrinsically likely to be more meaningful. In this paper, we develop a
framework that can be used to assess the stability of a provided ranking and to
obtain a stable ranking within an "acceptable" range of weight values (called
"the region of interest"). We address the case where the user cares about the
rank order of the entire set of items, and also the case where the user cares
only about the top- items. Using a geometric interpretation, we propose
algorithms that produce stable rankings. In addition to theoretical analyses,
we conduct extensive experiments on real datasets that validate our proposal
Supporting case-based retrieval by similarity skylines: Basic concepts and extensions
Conventional approaches to similarity search and case-based
retrieval, such as nearest neighbor search, require the speci cation of a
global similarity measure which is typically expressed as an aggregation
of local measures pertaining to di erent aspects of a case. Since the
proper aggregation of local measures is often quite di cult, we propose a
novel concept called similarity skyline. Roughly speaking, the similarity
skyline of a case base is de ned by the subset of cases that are most
similar to a given query in a Pareto sense. Thus, the idea is to proceed
from a d-dimensional comparison between cases in terms of d (local)
distance measures and to identify those cases that are maximally similar
in the sense of the Pareto dominance relation [2]. To re ne the retrieval
result, we propose a method for computing maximally diverse subsets of
a similarity skyline. Moreover, we propose a generalization of similarity
skylines which is able to deal with uncertain data described in terms of
interval or fuzzy attribute values. The method is applied to similarity
search over uncertain archaeological data
- âŠ