14 research outputs found

    Continuous Top-k Dominating Queries in Subspaces

    Full text link

    Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

    Full text link
    Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

    Optimizing skyline query processing in incomplete data

    Get PDF
    Given the significance of skyline queries, they are incorporated in various modern applications including personalized recommendation systems as well as decision-making and decision-support systems. Skyline queries are used to identify superior data items in the database. Most of the previously proposed skyline algorithms work on a complete database where the data are always present (non-missing). However, in many contemporary real-world databases, particularly those databases with large cardinality and high dimensionality, such assumption is not necessarily valid. Hence, missing data pose new challenges if the processing skyline queries cannot easily apply those methods that are designed for complete data. This is due to the fact that imperfect data cause the loss of the transitivity property of the skyline method and cyclic dominance. This paper presents a framework called Optimized Incomplete Skyline (OIS) which utilizes a technique that simplifies the skyline process on a database with missing data and helps prune the data items before performing the skyline process. The proposed strategy assures that the number of the domination tests is significantly reduced. A set of experiments has been accomplished using both real and synthetic datasets aimed at validating the performance of the framework. The experiment results confirm that the OIS framework is indeed superior and steadily outperforms the current approaches in terms of the number of domination tests required to retrieve the skylines

    Top-k Dominating Queries on Incomplete Data

    Get PDF

    Parallel and progressive approaches for skyline query over probabilistic incomplete database

    Full text link
    The advanced productivity of the modern society has created a wide range of similar commodities. However, the descriptions of commodities are always incomplete. Therefore, it is difficult for consumers to make choices. In the face of this problem, skyline query is a useful tool. However, the existing algorithms are unable to address incomplete probabilistic databases. In addition, it is necessary to wait for query completion to obtain even partial results. Furthermore, traditional skyline algorithms are usually serial. Thus, they cannot utilize multi-core processors effectively. Therefore, a parallel progressive skyline query algorithm for incomplete databases is imperative, which provides answers gradually and much faster. To address these problems, we design a new algorithm that uses multi-level grouping, pruning strategies, and pruning tuple transferring, which significantly decreases the computational costs. Experimental results demonstrate that the skyline results can be obtained in a short time. The parallel efficiency for an Octa-core processor reaches 90% on high-dimensional, large databases.<br /

    Perancangan Sistem Pembagian Beban pada Basis Data Multi-Master Terdistribusi untuk Massive Data Transaction

    Get PDF
    Semakin berkembangnya teknologi informasi menuntut semakin banyaknya penggunaan perangkat berupa komputer atau perangkat jaringan. Salah satu pemanfaatan dari teknologi tersebut adalah penggunaan website untuk menangani perekrutan anggota baru. Proses tersebut memerlukan kemampuan penulisan dan pengambilan data yang cepat dan dapat mengangani banyak permintaan dalam satu waktu. Metode yang saat banyak digunakan untuk melakukan penyimpanan data adalah master-slave. Metode tersebut menggunakan sebuah master node yang memungkinkan untuk melakukan penulisan data. Sedangkan beberapa Slave node akan menyalin data dari master node dan melayani pembacaan data. Kelemahan dari metode master-slave adalah saat terjadi penulisan data yang besar secara bersamaan. Hal tersebut mengakibatkan bottleneck-effect pada master node. Untuk menanggulangi masalah tersebut maka digunakan sebuah metode master-master dimana semua node memiliki kemampuan untuk menulis dan membaca data. Setiap node akan melakukan sinkronisasi agar data setiap node tetap sama. Topologi yang digunakan untuk mengimplementasikan metode master-master membutuhkan lebih dari satu node agar bekerja dengan baik. Oleh karena itu dibutuhkan sebuah sekema pembagian beban yang bertujuan menentukan node mana yang menerima proses. Algoritma yang dipakai untuk menentukan tujuan pembagian tersebut menggunakan top-k. Algoritma tersebut menggunaan prosessor dan memory sebagai pertimbangan untuk menentukan node tujuan. ============================================================================================== The rapid development of information technology leads to the considerable number of devices usage, both in computers or network devices. One of the utilization of such technology is the utilization of website to handle the recruitment of new members. The process requires fast writing and retrieval capabilities along with multiple request handling capabiliy at a time. The most widely used method of data storage is master-slave. The method uses a master node that makes it possible to perform data writing, while some slave nodes will copy data from the master node and serve data readings. The disadvantage of the master-slave method is when there is large data writing tasks simultaneously, they will cause bottleneck effect on the master node. To overcome this problem, master-master method are proposed, where all nodes have the ability to write and read data. Each node will synchronize the data for each node to remain the same state of the data. In this work, we implement master-master method that requires more than one node to work properly. Therefore it takes a load-sharing scheme that aims to determine which nodes are accept process. This work employs top-k technique to handle the division task around jobs. Furthermore, processor and memory information are considered by the algorithm in determining the destination node

    Top-k Dominating Queries on Incomplete Data

    Get PDF

    Desain dan Implementasi Aplikasi Pengolahan Top-K Dominating Query Berbasis Streaming Menggunakan Incomplete Data

    Get PDF
    Decision support application merupakan aplikasi yang berperan penting dalam industri dan berbagai bidang lainnya. Contoh dari decision support application seperti monitoring jaringan komputer, aplikasi analisa data ilmiah, aplikasi managemen jaringan sensor, dan seterusnya. Dalam decision support aplication, top-k dominating query menjadi salah satu riset yang sedang dikembangkan akhir-akhir ini. Dengan metode top-k dominating query akan mengembalikan objek-objek dengan k skor dominasi tertinggi dari data yang ada. Kebutuhan akan memonitor hasil query dengan sering adanya perubahan data (data stream) juga menjadi masalah tersendiri agar tidak melakukan perhitungan lagi dari awal setiap ada data baru masuk. Selain itu data pada keyataannya tidak selalu lengkap (incomplete) sehingga tidak bisa melakukan query yang sama pada data lengkap. Artikel ini, mengusung permasalahan pengolahan top-k dominating query berbasis data streaming dengan menggunakan incomplete data. Diusulkan sebuah metode algoritme event based untuk menangani tantangan tersebut. Algoritme ini diusulkan karena mempercepat waktu perhitungan pada setiap query-nya sehingga dapat memonitor top-k dominating query secara terus menerus dengan efisien. Dengan menggunakan data independen, anti corelated dan forest covertype untuk uji coba metode ini terbukti memberikan hasil yang lebih baik dalam hal penggunaan waktu perhitungan dari pada algoritme naive dengan penurunan rata – rata waktu query sejumlah 77,87%

    Gestion de flux de données pour l'observation de systèmes

    Get PDF
    La popularisation de la technologie a permis d'implanter des dispositifs et des applications de plus en plus développés à la portée d'utilisateurs non experts. Ces systèmes produisent des flux ainsi que des données persistantes dont les schémas et les dynamiques sont hétérogènes. Cette thèse s'intéresse à pouvoir observer les données de ces systèmes pour aider à les comprendre et à les diagnostiquer. Nous proposons tout d'abord un modèle algébrique Astral capable de traiter sans ambiguïtés sémantiques des données provenant de flux ou relations. Le moteur d'exécution Astronef a été développé sur l'architecture à composants orientés services pour permettre une grande adaptabilité. Il est doté d'un constructeur de requête permettant de choisir un plan d'exécution efficace. Son extension Asteroid permet de s'interfacer avec un SGBD pour gérer des données persistantes de manière intégrée. Nos contributions sont confrontées à la pratique par la mise en œuvre d'un système d'observation du réseau domestique ainsi que par l'étude des performances. Enfin, nous nous sommes intéressés à la mise en place de la personnalisation des résultats dans notre système par l'introduction d'un modèle de préférences top-k.Due to the popularization of technology, non-expert people can now use more and more advanced devices and applications. Such systems produce data streams as well as persistent data with heterogeneous schemas and dynamics. This thesis is focused on monitoring data coming from those systems to help users to understand and to perform diagnosis on them. We propose an algebraic model Astral able to treat data coming from streams or relations without semantic ambiguity. The engine Astronef has been developed on top of a service-oriented component framework to enable a large adaptability. It embeds a query builder which can select a composition of components to provide an efficient query plan. Its extension Asteroid interfaces with a DBMS in order to manage persistent data in an integrated manner. Our contributions have been confronted to practice with the deployment of a monitoring system for the digital home and with a performance study. Finally, we extend our approach with an operator to personalize the results by introducing a top-k preference model.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF
    corecore