138 research outputs found

    Energy-Efficient ฮฒ

    Get PDF
    As the first priority of query processing in wireless sensor networks is to save the limited energy of sensor nodes and in many sensing applications a part of skyline result is enough for the userโ€™s requirement, calculating the exact skyline is not energy-efficient relatively. Therefore, a new approximate skyline query, ฮฒ-approximate skyline query which is limited by a guaranteed error bound, is proposed in this paper. With an objective to reduce the communication cost in evaluating ฮฒ-approximate skyline queries, we also propose an energy-efficient processing algorithm using mapping and filtering strategies, named Actual Approximate Skyline (AAS). And more than that, an extended algorithm named Hypothetical Approximate Skyline (HAS) which replaces the real tuples with the hypothetical ones is proposed to further reduce the communication cost. Extensive experiments on synthetic data have demonstrated the efficiency and effectiveness of our proposed approaches with various experimental settings

    An Energy-Efficient Skyline Query for Massively Multidimensional Sensing Data

    Get PDF
    Cyber physical systems (CPS) sense the environment based on wireless sensor networks. The sensing data of such systems present the characteristics of massiveness and multi-dimensionality. As one of the major monitoring methods used in in safe production monitoring and disaster early-warning applications, skyline query algorithms are extensively adopted for multiple-objective decision analysis of these sensing data. With the expansion of network sizes, the amount of sensing data increases sharply. Then, how to improve the query efficiency of skyline query algorithms and reduce the transmission energy consumption become pressing and difficult to accomplish issues. Therefore, this paper proposes a new energy-efficient skyline query method for massively multidimensional sensing data. First, the method uses a node cut strategy to dynamically generate filtering tuples with little computational overhead when collecting query results instead of issuing queries with filters. It can judge the domination relationship among different nodes, remove the detected data sets of dominated nodes that are irrelevant to the query, modify the query path dynamically, and reduce the data comparison and computational overhead. The efficient dynamic filter generated by this strategy uses little non-skyline data transmission in the network, and the transmission distance is very short. Second, our method also employs the tuple-cutting strategy inside the node and generates the local cutting tuples by the sub-tree with the node itself as the root node, which will be used to cut the detected data within the nodes of the sub-tree. Therefore, it can further control the non-skyline data uploading. A large number of experimental results show that our method can quickly return an overview of the monitored area and reduce the communication overhead. Additionally, it can shorten the response time and improve the efficiency of the query

    The 10th Jubilee Conference of PhD Students in Computer Science

    Get PDF

    Mining and Managing User-Generated Content and Preferences

    Get PDF
    ฮ™n this thesis, we present techniques to manage the results of expressive queries, such as skyline, and mine online content that has been generated by users. Given the numerous scenarios and applications where content mining can be applied, we focus, in particular, to two cases: review mining and social media analysis. More specifically, we focus on preference queries, where users can query a set of items, each associated with an attribute set. For each of the attributes, users can specify their preference on whether to minimize or maximize it, e.g., "minimize price", "maximize performance", etc. Such queries are also know as "pareto optimal", or "skyline queries". A drawback of this query type is that the result may become too large for the user to inspect manually. We propose an approach that addresses this issue, by selecting a set of diverse skyline results. We provide a formal definition of skyline diversification and present efficient techniques to return such a set of points. The result can then be ranked according to established quality criteria. We also propose an alternative scheme for ranking skyline results, following an information retrieval approach

    Parallel Continuous Preference Queries over Out-of-Order and Bursty Data Streams

    Get PDF
    Techniques to handle traffic bursts and out-of-order arrivals are of paramount importance to provide real-time sensor data analytics in domains like traffic surveillance, transportation management, healthcare and security applications. In these systems the amount of raw data coming from sensors must be analyzed by continuous queries that extract value-added information used to make informed decisions in real-time. To perform this task with timing constraints, parallelism must be exploited in the query execution in order to enable the real-time processing on parallel architectures. In this paper we focus on continuous preference queries, a representative class of continuous queries for decision making, and we propose a parallel query model targeting the efficient processing over out-of-order and bursty data streams. We study how to integrate punctuation mechanisms in order to enable out-of-order processing. Then, we present advanced scheduling strategies targeting scenarios with different burstiness levels, parameterized using the index of dispersion quantity. Extensive experiments have been performed using synthetic datasets and real-world data streams obtained from an existing real-time locating system. The experimental evaluation demonstrates the efficiency of our parallel solution and its effectiveness in handling the out-of-orderness degrees and burstiness levels of real-world applications

    ๋น…๋ฐ์ดํ„ฐ์˜ ํšจ์œจ์ ์ธ ์Šค์นด์ด๋ผ์ธ ์งˆ์˜ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 2. ์‹ฌ๊ทœ์„.์Šค์นด์ด๋ผ์ธ ์งˆ์˜์™€ ์Šค์นด์ด๋ผ์ธ์—์„œ ํŒŒ์ƒ๋œ ๋™์  ์Šค์นด์ด๋ผ์ธ, ์—ญ ์Šค์นด์ด๋ผ์ธ ๊ทธ๋ฆฌ๊ณ  ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ ์งˆ์˜๋“ค์€ ๋‹ค์–‘ํ•œ ์‘์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ตœ๊ทผ์— ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์–ด ์™”๋‹ค. ์Šค์นด์ด๋ผ์ธ ์งˆ์˜๋“ค์€ ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ํšจ์œจ์ ์ธ ์Šค์นด์ด๋ผ์ธ ์งˆ์˜ ์ฒ˜๋ฆฌ๋Š” ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋‹ค. ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์œ„ํ•ด ๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๊ณ , ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์Šค์นด์ด๋ผ์ธ, ๋™์  ์Šค์นด์ด๋ผ์ธ, ์—ญ ์Šค์นด์ด๋ผ์ธ, ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ ์งˆ์˜ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํšจ์œจ์ ์ธ ๋งต๋ฆฌ๋“€์Šค ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•œ๋‹ค. ์Šค์นด์ด๋ผ์ธ, ๋™์  ์Šค์นด์ด๋ผ์ธ, ์—ญ ์Šค์นด์ด๋ผ์ธ์— ๋Œ€ํ•ด์„œ๋Š” ์งˆ์˜ ๊ฒฐ๊ณผ์— ํฌํ•จ๋  ์ˆ˜ ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ฟผ๋“œํŠธ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•œ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ์ƒ์„ฑํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํžˆ์Šคํ† ๊ทธ๋žจ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ ํŒŒํ‹ฐ์…˜์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋งŒ์„ ์ด์šฉํ•˜์—ฌ ์Šค์นด์ด๋ผ์ธ์ด ๋  ์ˆ˜ ์žˆ๋Š” ํ›„๋ณด ๋ฐ์ดํ„ฐ๋ฅผ ๋งต๋ฆฌ๋“€์Šค๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ณ‘๋ ฌ์ ์œผ๋กœ ๋ฝ‘์•„๋‚ธ๋‹ค. ๊ทธ ํ›„์— ๋‹ค์‹œ ๋งต๋ฆฌ๋“€์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํ›„๋ณด ๋ฐ์ดํ„ฐ์ค‘ ์‹ค์ œ ์Šค์นด์ด๋ผ์ธ์„ ์ฐพ์•„๋‚ธ๋‹ค. ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ์˜ ํšจ์œจ์ ์ธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๋จผ์ € ์„ธ๊ฐ€์ง€ ํ•„ํ„ฐ๋ง ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ํ•„ํ„ฐ๋ง ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฟผ๋“œํŠธ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•œ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ฟผ๋“œํŠธ๋ฆฌ์˜ ์˜์—ญ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒํ‹ฐ์…˜ํ•˜๊ณ  ๊ฐ ํŒŒํ‹ฐ์…˜๋งˆ๋‹ค ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ ์ ๋“ค์„ ์ฐพ์•„๋‚ธ๋‹ค. ๊ฐ ์ปดํ“จํ„ฐ์˜ ์ˆ˜ํ–‰์‹œ๊ฐ„์„ ๋น„์Šทํ•˜๊ฒŒ ๋งž์ถ”๊ธฐ ์œ„ํ•ด์„œ ๋ถ€ํ•˜๊ท ํ˜• ๊ธฐ๋ฒ•๋„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ œ์•ˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ๋“ค์ด ์ตœ์‹  ๊ด€๋ จ ์—ฐ๊ตฌ ๋ณด๋‹ค ์ข‹์Œ์„ ํ™•์ธํ•˜์˜€๊ณ , ์‚ฌ์šฉํ•˜๋Š” ์ปดํ“จํ„ฐ์˜ ์ˆ˜๋ฅผ ๋Š˜๋ฆผ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ํ™•์žฅ์„ฑ์„ ๊ฐ–๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.The skyline operator and its variants such as dynamic skyline, reverse skyline and probabilistic skyline operators have attracted considerable attention recently due to its broad applications. However, computing a skyline is challenging today since we have to deal with big data. For data-intensive applications, the MapReduce framework has been widely used recently. In this dissertation, we propose the efficient parallel algorithms for processing skyline, dynamic skyline, reverse skyline and probabilistic skyline queries using MapReduce. For the skyline, dynamic skyline and reverse skyline queries, we first build quadtree-based histograms to prune out non-skyline points. We next partition data based on the regions divided by the histograms and compute candidate skyline points for each partition using MapReduce. Finally, in every partition, we check whether each skyline candidate point is actually a skyline point or not using MapReduce. For the probabilistic skyline query, we first introduce three filtering techniques to prune out points that are not probabilistic skyline points. Then, we build a quadtree-based histogram and split data into partitions according to the regions divided by the quadtree. We finally compute the probabilistic skyline points for each partition using MapReduce. We also develop the workload balancing methods to make the estimated execution times of all available machines to be similar. We did experiments to compare our algorithms with the state-of-the-art algorithms using MapReduce and confirmed the effectiveness as well as the scalability of our proposed skyline algorithms.1 INTRODUCTION 1 1.1 Motivation 1 1.2 Contributions of This Dissertation 6 1.3 Dissertation Overview 8 2 Related Work 10 2.1 Skyline Queries 10 2.2 Reverse Skyline Queries 13 2.3 Probabilistic Skyline Queries 14 3 Background 17 3.1 Skyline and Its Variants 17 3.2 MapReduce Framework 22 4 Parallel Skyline Query Processing 24 4.1 SKY-MR: Our Skyline Computation Algorithm 24 4.1.1 SKY-QTREE: The Sky-Quadtree Building Algorithm 25 4.1.2 L-SKY-MR: The Local Skyline Computation Algorithm 29 4.1.3 G-SKY-MR: The Global Skyline Computation Algorithm 32 4.2 Experiment 34 4.2.1 Performance Results for Skylines 36 4.2.2 Performance Results in Other Environments 41 5 Parallel Reverse Skyline Query Processing 45 5.1 RSKY-MR: Our Reverse Skyline Computation Algorithm 45 5.1.1 RSKY-QTREE: The Rsky-Quadtree Building Algorithm 47 5.1.2 Computations of Reverse Skylines using Rsky-Quadtrees 50 5.1.3 L-RSKY-MR: The Local Reverse Skyline Computation Algorithm 53 5.1.4 G-RSKY-MR: The Global Reverse Skyline Computation Algorithm 57 5.2 Experiment 59 5.2.1 Performance Results for Reverse Skylines 59 6 Parallel Probabilistic Skyline Query Processing 63 6.1 Early Pruning Techniques 63 6.1.1 Upper-bound Filtering 63 6.1.2 Zero-probability Filtering 67 6.1.3 Dominance-Power Filtering 68 6.2 Utilization of a PS-QTREE for Pruning 69 6.2.1 Generating a PS-QTREE 70 6.2.2 Exploiting a PS-QTREE for Filtering 70 6.2.3 Partitioning Objects by a PS-QTREE 71 6.3 PS-QPF-MR: Our Algorithm with Quadtree Partitiong and Filtering 73 6.3.1 Optimizations of PS-QPF-MR 79 6.3.2 Sample Size and Split Threshold of a PSQtree 83 6.4 PS-BRF-MR: Our Algorithm with Random Partitioning and Filtering 84 6.5 Experiments 87 6.5.1 Performance Results for Probabilistic Skylines 89 7 Conclusion 97 Bibliography 99 Abstract (In Korean) 105Docto

    Mobile capture of remote points of interest using line of sight modelling

    Get PDF
    Recording points of interest using GPS whilst working in the field is an established technique in geographical fieldwork, where the userโ€™s current position is used as the spatial reference to be captured; this is known as geo-tagging. We outline the development and evaluation of a smartphone application called Zapp that enables geo-tagging of any distant point on the visible landscape. The ability of users to log or retrieve information relating to what they can see, rather than where they are standing, allows them to record observations of points in the broader landscape scene, or to access descriptions of landscape features from any viewpoint. The application uses the compass orientation and tilt of the phone to provide data for a line of sight algorithm that intersects with a Digital Surface Model stored on the mobile device. We describe the development process and design decisions for Zapp present the results of a controlled study of the accuracy of the application, and report on the use of Zapp for a student field exercise. The studies indicate the feasibility of the approach, but also how the appropriate use of such techniques will be constrained by current levels of precision in mobile sensor technology. The broader implications for interactive query of the distant landscape and for remote data logging are discussed

    Efficient Processing of Ranking Queries in Novel Applications

    Get PDF
    Ranking queries, which return only a subset of results matching a user query, have been studied extensively in the past decade due to their importance in a wide range of applications. In this thesis, we study ranking queries in novel environments and settings where they have not been considered so far. With the advancements in sensor technologies, these small devices are today present in all corners of human life. Millions of them are deployed in various places and are sending data on a continuous basis. These sensors which before mainly monitored environmental phenomena or production chains, have now found their way into our daily lives as well; health monitoring being a plausible example of how much we rely on continuous observation of measurements. As the Web technology evolves and facilitates data stream transmissions, sensors do not remain the sole producers of data in form of streams. The Web 2.0 has escalated the production of user-generated content which appear in form of annotated posts in a Weblog (blog), pictures and videos, or small textual snippets reflecting the current activity or status of users and can be regarded as natural items of a temporal stream. A major part of this thesis is devoted to developing novel methods which assist in keeping track of this ever increasing flow of information with continuous monitoring of ranking queries over them, particularly when traditional approaches fail to meet the newly raised requirements. We consider the ranking problem when the information flow is not synchronized among its sources. This is a recurring situation, since sensors are run by different organizations, measure moving entities, or are simply represented by users which are inherently not synchronizable. Our methods are in particular designed for handling unsynchronized streams, calculating an object's score based on both its currently observed contribution to the registered queries as well as the contribution it might have in future. While this uncertainty in score calculation causes linear growth in the space necessary for providing exact results, we are able to define criteria which allows for evicting unpromising objects as early as possible. We also leverage statistical properties that reflect the correlation between multiple streams to predict the future to provide better bounds for the best possible contribution of an object, consequently limiting the necessary storage dramatically. To achieve this, we make use of small statistical synopses that are periodically refreshed during runtime. Furthermore, we consider user generated queries in the context of Web 2.0 applications which aim at filtering data streams in forms of textual documents, based on personal interests. In this case, the dimensionality of the data, the large cardinality of the subscribed queries, as well as the desire for consuming recent information, raise new challenges. We develop new approaches which efficiently filter the information and provide real-time updates to the user subscribed queries. Our methods rely on a novel ordering of user queries in traditional inverted lists which allows the system to effectively prune those queries for which a new piece of information is of no interest. Finally, we investigate high quality search in user generated content in Web 2.0 applications in form of images or videos. These resources are inherently dispersed all over the globe, therefore can be best managed in a purely distributed peer-to-peer network which eliminates single points of failure. Search in such a huge repository of high dimensional data involves evaluating ranking queries in form of nearest neighbor queries. Therefore, we study ranking queries in high dimensional spaces, where the index of the objects is maintained in a purely distributed fashion. Our solution meets the two major requirements of a viable solution in distributing the index and evaluating ranking queries: the underlying peer-to-peer network remains load balanced, and efficient query evaluation is feasible as similar objects are assigned to nearby peers
    • โ€ฆ
    corecore