Search CORE

605 research outputs found

Continuous top-K monitoring on document streams (Extended abstract)

Author: LI Ye
MOURATIDIS Kyriakos
U Leong Hou
ZHANG Junjie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2018
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Verifiable and private top-k monitoring

Author: DING Xuhua
PANG Hwee Hwa
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Institutional Knowledge at Singapore Management University

Secure server-aided top-k monitoring

Author: DING Xuhua
PANG Hwee Hwa
WANG Yujue
YANG Yanjiang
Publication venue: 'Elsevier BV'
Publication date: 01/12/2017
Field of study

National Research Foundation (NRF) Singapor

Crossref

Institutional Knowledge at Singapore Management University

Reverse spatial visual top-k query

Author: Song Jiayu
Yu Hao
Yu Weiren
Zhang Chengyuan
Zhang Zuping
Zhu Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/01/2020
Field of study

With the wide application of mobile Internet techniques an location-based services (LBS), massive multimedia data with geo-tags has been generated and collected. In this paper, we investigate a novel type of spatial query problem, named reverse spatial visual top-

k

query (RSVQ k ) that aims to retrieve a set of geo-images that have the query as one of the most relevant geo-images in both geographical proximity and visual similarity. Existing approaches for reverse top-

k

queries are not suitable to address this problem because they cannot effectively process unstructured data, such as image. To this end, firstly we propose the definition of RSVQ k problem and introduce the similarity measurement. A novel hybrid index, named VR 2 -Tree is designed, which is a combination of visual representation of geo-image and R-Tree. Besides, an extension of VR 2 -Tree, called CVR 2 -Tree is introduced and then we discuss the calculation of lower/upper bound, and then propose the optimization technique via CVR 2 -Tree for further pruning. In addition, a search algorithm named RSVQ k algorithm is developed to support the efficient RSVQ k query. Comprehensive experiments are conducted on four geo-image datasets, and the results illustrate that our approach can address the RSVQ k problem effectively and efficiently

Warwick Research Archives Portal Repository

Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

Author: Ezatpoor Payam
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2017
Field of study

Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

University of Nevada, Las Vegas Repository

Cleaning uncertain data for top-k queries

Author: Cheng R
Cheung DWL
Li X
Mo L
Yang XS
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this 'cleaning operation' may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal. © 2013 IEEE.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Temporal search in document streams

Author: Kotsakos Dimitrios
Publication venue
Publication date: 27/10/2015
Field of study

In this thesis, we address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on termmatching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. On the other hand, time-only-based methods fall short when it comes to reasoning about events in social media. During the last few years users create chronologically ordered documents about topics that draw their attention in an ever increasing pace. However, with the vast adoption of social media, new types of marketing campaigns have been developed in order to promote content, i.e. brands, products, celebrities, etc

Digital Repository of Hellenic Managing Authority of the Operational Programme "Education and Lifelong Learning" (EDULLL)

Load shedding in network monitoring applications

Author: Barlet Ros Pere
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2008
Field of study

Monitoring and mining real-time network data streams are crucial operations for managing and operating data networks. The information that network operators desire to extract from the network traffic is of different size, granularity and accuracy depending on the measurement task (e.g., relevant data for capacity planning and intrusion detection are very different). To satisfy these different demands, a new class of monitoring systems is emerging to handle multiple and arbitrary monitoring applications. Such systems must inevitably cope with the effects of continuous overload situations due to the large volumes, high data rates and bursty nature of the network traffic. These overload situations can severely compromise the accuracy and effectiveness of monitoring systems, when their results are most valuable to network operators. In this thesis, we propose a technique called load shedding as an effective and low-cost alternative to over-provisioning in network monitoring systems. It allows these systems to handle efficiently overload situations in the presence of multiple, arbitrary and competing monitoring applications. We present the design and evaluation of a predictive load shedding scheme that can shed excess load in front of extreme traffic conditions and maintain the accuracy of the monitoring applications within bounds defined by end users, while assuring a fair allocation of computing resources to non-cooperative applications. The main novelty of our scheme is that it considers monitoring applications as black boxes, with arbitrary (and highly variable) input traffic and processing cost. Without any explicit knowledge of the application internals, the proposed scheme extracts a set of features from the traffic streams to build an on-line prediction model of the resource requirements of each monitoring application, which is used to anticipate overload situations and control the overall resource usage by sampling the input packet streams. This way, the monitoring system preserves a high degree of flexibility, increasing the range of applications and network scenarios where it can be used. Since not all monitoring applications are robust against sampling, we then extend our load shedding scheme to support custom load shedding methods defined by end users, in order to provide a generic solution for arbitrary monitoring applications. Our scheme allows the monitoring system to safely delegate the task of shedding excess load to the applications and still guarantee fairness of service with non-cooperative users. We implemented our load shedding scheme in an existing network monitoring system and deployed it in a research ISP network. We present experimental evidence of the performance and robustness of our system with several concurrent monitoring applications during long-lived executions and using real-world traffic traces.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura