6,237 research outputs found
Search optimizations in structured peer-to-peer systems
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow non-uniform distributions, often power law distributions, creating intense local storage hotspots and network bottlenecks on specific hosts. Current techniques like caching cannot, alone, cope with this issue. We propose a distributed data structure based on a decentralized balanced tree to balance storage data and network load more uniformly across hosts. The results show that the data structure is capable of balancing resources, in particular when performing multiple keyword searches
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
Recommended from our members
Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures
The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu
Handling Network Partitions and Mergers in Structured Overlay Networks
Structured overlay networks form a major class of peer-to-peer systems, which are touted for their abilities to
scale, tolerate failures, and self-manage. Any long-lived
Internet-scale distributed system is destined to face network partitions. Although the problem of network partitions
and mergers is highly related to fault-tolerance and
self-management in large-scale systems, it has hardly been
studied in the context of structured peer-to-peer systems.
These systems have mainly been studied under churn (frequent
joins/failures), which as a side effect solves the problem
of network partitions, as it is similar to massive node
failures. Yet, the crucial aspect of network mergers has been
ignored. In fact, it has been claimed that ring-based structured
overlay networks, which constitute the majority of the
structured overlays, are intrinsically ill-suited for merging
rings. In this paper, we present an algorithm for merging
multiple similar ring-based overlays when the underlying
network merges. We examine the solution in dynamic conditions,
showing how our solution is resilient to churn during
the merger, something widely believed to be difficult or
impossible. We evaluate the algorithm for various scenarios
and show that even when falsely detecting a merger, the
algorithm quickly terminates and does not clutter the network
with many messages. The algorithm is flexible as the
tradeoff between message complexity and time complexity
can be adjusted by a parameter
Socially-Aware Distributed Hash Tables for Decentralized Online Social Networks
Many decentralized online social networks (DOSNs) have been proposed due to
an increase in awareness related to privacy and scalability issues in
centralized social networks. Such decentralized networks transfer processing
and storage functionalities from the service providers towards the end users.
DOSNs require individualistic implementation for services, (i.e., search,
information dissemination, storage, and publish/subscribe). However, many of
these services mostly perform social queries, where OSN users are interested in
accessing information of their friends. In our work, we design a socially-aware
distributed hash table (DHTs) for efficient implementation of DOSNs. In
particular, we propose a gossip-based algorithm to place users in a DHT, while
maximizing the social awareness among them. Through a set of experiments, we
show that our approach reduces the lookup latency by almost 30% and improves
the reliability of the communication by nearly 10% via trusted contacts.Comment: 10 pages, p2p 2015 conferenc
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications
Heterogeneous High-Performance Computing
(HPC) platforms present a significant programming challenge,
especially because the key users of HPC resources are scientists,
not parallel programmers. We contend that compiler technology
has to evolve to automatically create the best program variant
by transforming a given original program. We have developed a
novel methodology based on type transformations for generating
correct-by-construction design variants, and an associated
light-weight cost model for evaluating these variants for
implementation on FPGAs. In this paper we present a key
enabler of our approach, the cost model. We discuss how we
are able to quickly derive accurate estimates of performance
and resource-utilization from the design’s representation in our
intermediate language. We show results confirming the accuracy
of our cost model by testing it on three different scientific
kernels. We conclude with a case-study that compares a solution
generated by our framework with one from a conventional
high-level synthesis tool, showing better performance and
power-efficiency using our cost model based approach
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
- …