Search CORE

2,423 research outputs found

Byzantine Modification Detection in Multicast Networks With Random Network Coding

Author: Effros Michelle
Ho Tracey
Karger David R.
Koetter Ralf
Leong Ben
Médard Muriel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

An information-theoretic approach for detecting Byzantine or adversarial modifications in networks employing random linear network coding is described. Each exogenous source packet is augmented with a flexible number of hash symbols that are obtained as a polynomial function of the data symbols. This approach depends only on the adversary not knowing the random coding coefficients of all other packets received by the sink nodes when designing its adversarial packets. We show how the detection probability varies with the overhead (ratio of hash to data symbols), coding field size, and the amount of information unknown to the adversary about the random code

Reliable and randomized data distribution strategies for large scale storage systems

Author: Brinkmann A.
Cortés Toni
Effert S.
Kang Y.
Miller E.L.
Miranda Bueno Alberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The ever-growing amount of data requires highly scalable storage solutions. The most flexible approach is to use storage pools that can be expanded and scaled down by adding or removing storage devices. To make this approach usable, it is necessary to provide a solution to locate data items in such a dynamic environment. This paper presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations, drastically reducing the required amount of randomness while delivering a perfect load distribution.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Considering Complex Search Techniques in DHTs Under Churn

Author: Furness Jamie
Kolberg Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Abstract-Traditionally complex queries have been performed over unstructured P2P networks by means of flooding, which is inherently inefficient due to the large number of redundant messages generated. While Distributed Hash Tables (DHTs) can provide very efficient look-up operations, they traditionally do not provide any methods for complex queries. By exploiting the structure inherent in DHTs we can perform complex querying over structured P2P networks by means of efficiently broadcasting the search query. This allows every node in the network to process the query locally, and hence is as powerful and flexible as flooding in unstructured networks, but without the inefficiency of redundant messages. While there have been various approaches proposed for broadcasting search queries over DHTs, the focus has not been on validation under churn. Comparing blind search methods for DHTs through simulation we see that churn, in particular nodes leaving the network, has a large impact on query success rate. In this paper we present novel results comparing blind search over Chord and Pastry while under varying levels of churn. We further consider how different data replication strategies can be used to enhance the query success rate

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Dynamic Analysis of Algebraic Structure to Optimize Test Generation and Test Case Selection

Author: Simons A.J.H.
Zhao W.W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Where no independent specification is available, object-oriented unit testing is limited to exercising all interleaved method paths, seeking unexpected failures. A recent trend in unit testing, that interleaves dynamic analysis between each test cycle, has brought useful reductions in test-set sizes by pruning redundant prefix paths. This paper describes a dynamic approach to analyzing the algebraic structure of test objects, such that prefix paths ending in observer or transformer operations yielding unchanged, or derived states may be detected and pruned on-the-fly during testing. The fewer retained test cases are so close to the ideal algebraic specification cases that a tester can afford to confirm or reject these cases interactively, which are then used as a test oracle to predict many further test outcomes during automated testing. The algebra-inspired algorithms are incorporated in the latest version of the JWalk lazy systematic unit testing tool suite, which discovers key test cases, while pruning many thousands of redundant test cases

White Rose Research Online

Metadata-Aware Query Processing over Data Streams

Author: Ding Luping
Publication venue: Digital WPI
Publication date: 22/04/2008
Field of study

Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination. The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies. The second part proposes herald-driven runtime query plan optimization techniques. We identify four query optimization techniques, design a lightweight algorithm to efficiently detect the optimization opportunities at runtime upon receiving heralds. We propose a novel execution paradigm to support multiple concurrent logical plans by maintaining one physical plan. Extensive experimental study confirms that our techniques significantly reduce query execution times. The third part deals with the shared execution of parameterized queries instantiated from a query template. We design a lightweight index mechanism to provide multiple access paths to data to facilitate a wide range of parameterized queries. To withstand workload fluctuations, we propose an index tuning framework to tune the index configurations in a timely manner. Extensive experimental evaluations demonstrate the effectiveness of the proposed strategies. The last part proposes event query optimization techniques by exploiting event constraints such as exclusiveness or ordering relationships among events extracted from workflows. Significant performance gains are shown to be achieved by our proposed constraint-aware event processing techniques

DigitalCommons@WPI

Optimization of Analytic Window Functions

Author: Cao Yu
Chan Chee-Yong
Li Jie
Tan Kian-Lee
Publication venue
Publication date: 01/01/2012
Field of study

Analytic functions represent the state-of-the-art way of performing complex data analysis within a single SQL statement. In particular, an important class of analytic functions that has been frequently used in commercial systems to support OLAP and decision support applications is the class of window functions. A window function returns for each input tuple a value derived from applying a function over a window of neighboring tuples. However, existing window function evaluation approaches are based on a naive sorting scheme. In this paper, we study the problem of optimizing the evaluation of window functions. We propose several efficient techniques, and identify optimization opportunities that allow us to optimize the evaluation of a set of window functions. We have integrated our scheme into PostgreSQL. Our comprehensive experimental study on the TPC-DS datasets as well as synthetic datasets and queries demonstrate significant speedup over existing approaches.Comment: VLDB201

arXiv.org e-Print Archive

ScholarBank@NUS

Towards an Architecture for Efficient Distributed Search of Multimodal Information

Author: Mourão André Belchior
Publication venue
Publication date: 01/01/2018
Field of study

The creation of very large-scale multimedia search engines, with more than one billion images and videos, is a pressing need of digital societies where data is generated by multiple connected devices. Distributing search indexes in cloud environments is the inevitable solution to deal with the increasing scale of image and video collections. The distribution of such indexes in this setting raises multiple challenges such as the even partitioning of data space, load balancing across index nodes and the fusion of the results computed over multiple nodes. The main question behind this thesis is how to reduce and distribute the multimedia retrieval computational complexity? This thesis studies the extension of sparse hash inverted indexing to distributed settings. The main goal is to ensure that indexes are uniformly distributed across computing nodes while keeping similar documents on the same nodes. Load balancing is performed at both node and index level, to guarantee that the retrieval process is not delayed by nodes that have to inspect larger subsets of the index. Multimodal search requires the combination of the search results from individual modalities and document features. This thesis studies rank fusion techniques focused on reducing complexity by automatically selecting only the features that improve retrieval effectiveness. The achievements of this thesis span both distributed indexing and rank fusion research. Experiments across multiple datasets show that sparse hashes can be used to distribute documents and queries across index entries in a balanced and redundant manner across nodes. Rank fusion results show that is possible to reduce retrieval complexity and improve efficiency by searching only a subset of the feature indexes

Repositório da Universidade Nova de Lisboa