56 research outputs found
Distributed, Secure Load Balancing with Skew, Heterogeneity, and Churn
Numerous proposals exist for load balancing in peer-to-peer (p2p) networks. Some focus on namespace balancing, making the distance between nodes as uniform as possible. This technique works well under ideal conditions, but not under those found empirically. Instead, researchers have found heavytailed query distributions (skew), high rates of node join and leave (churn), and wide variation in node network and storage capacity (heterogeneity). Other approaches tackle these less-thanideal conditions, but give up on important security properties. We propose an algorithm that both facilitates good performance and does not dilute security. Our algorithm, k-Choices, achieves load balance by greedily matching nodes’ target workloads with actual applied workloads through limited sampling, and limits any fundamental decrease in security by basing each nodes’ set of potential identifiers on a single certificate. Our algorithm compares favorably to four others in trace-driven simulations. We have implemented our algorithm and found that it improved aggregate throughput by 20% in a widely heterogeneous system in our experiments.Engineering and Applied Science
Passive NFS Tracing of Email and Research Workloads
We present an analysis of a pair of NFS traces of contemporary email and research workloads. We show that although the research workload resembles previously studied workloads, the email workload is quite different. We also perform several new analyses that demonstrate the periodic nature of file system activity, the effect of out-of-order NFS calls, and the strong relationship between the name of a file and its size, lifetime, and access pattern.Engineering and Applied Science
Recommended from our members
Stable and Accurate Network Coordinates
Synthetic coordinate systems that mirror latencies between physical hosts have become a part of the toolbox networking researchers would like to use in real deployments. However, the most promising algorithm for building these coordinate systems, Vivaldi, breaks down when run under real world conditions. Previous work on network coordinates has examined their performance in simulation through the use of a latency matrix, which summarizes each link with a single latency. In a deployment, instead of perceiving a single latency for each link, nodes see a stream of distinct observations that may vary by as much as three orders-of-magnitude. With no means to discern an appropriate latency for each link, coordinate systems are prone to high error and instability in live deployments. Two simple enhancements improved Vivaldi’s accuracy by 54% and coordinate stability by 96% when run on a real large-scale network. First, we use a non-linear low pass filter to ascertain a clear underlying signal from each link. These filters primarily improve accuracy. Second, we introduce a distinction between system- and application-level coordinates. We evaluate a set of change-detection heuristics that allow coordinates to evolves at the system-level and only initiate an application-level update after a coordinate has undergone a significant change. These application-level coordinates retain the filter’s high accuracy and dramatically increase coordinate stability.Engineering and Applied Science
Organic Indoor Location Discovery
We describe an indoor, room-level location discovery method based on spatial variations in "wifi signatures," i.e., MAC addresses and signal strengths of existing wireless access points. The principal novelty of our system is its organic nature; it builds signal strength maps from the natural mobility and lightweight contributions of ordinary users, rather than dedicated effort by a team of site surveyors. Whenever a user's personal device observes an unrecognized signature, a GUI solicits the user's location. The resulting location-tagged signature or "bind" is then shared with other clients through a common database, enabling devices subsequently arriving there to discover location with no further user contribution.
Realizing a working system deployment required three novel elements: (1) a human-computer interface for indicating location over intervals of varying duration; (2) a client-server protocol for pre-fetching signature data for use in localization; and (3) a location-estimation algorithm incorporating highly variable signature data. We describe an experimental deployment of our method in a nine-story building with more than 1,400 distinct spaces served by more than 200 wireless access points. At the conclusion of the deployment, users could correctly localize to within 10 meters 92 percent of the time
Evaluating DHT-Based Service Placement for Stream Based Overlays
Stream-based overlay networks (SBONs) are one approach to implementing large-scale stream processing systems. A fundamental consideration in an SBON is that of service placement, which determines the physical location of in-network processing services or operators, in such a way that network resources are used efficiently. Service placement consists of two components: node discovery, which selects a candidate
set of nodes on which services might be placed, and node selection, which chooses the particular node to host a service. By viewing the placement problem as the composition of these two processes we can trade-off quality and efficiency between them. A bad discovery scheme can yield a good placement, but at the cost of an expensive selection mechanism.
Recent work on operator placement [3, 9] proposes to leverage routing paths in a distributed hash table (DHT) to obtain a set of candidate nodes for service placement. We evaluate the appropriateness of using DHT routing paths for service placement in an SBON, when aiming to minimize network usage. For this, we consider two DHT-based algorithms for node discovery, which use either the union or intersection of DHT routing paths in the SBON, and compare their performance to other techniques. We show that current DHT-based schemes are actually rather poor node discovery algorithms, when minimizing network utilization. An efficient DHT may not traverse enough hops to obtain a sufficiently large candidate set for placement. The union of DHT routes may result in a low-quality set of discovered nodes that requires an expensive node selection algorithm. Finally, the intersection of DHT routes relies on route convergence, which prevents the placement of services with a large fan-in.Engineering and Applied Science
Provenance-Aware Sensor Data Storage
Sensor network data has both historical and realtime value. Making historical sensor data useful, in particular, requires storage, naming, and indexing. Sensor data presents new challenges in these areas. Such data is location-specific but also distributed; it is collected in a particular physical location and may be most useful there, but it has additional value when combined with other sensor data collections in a larger distributed system. Thus, arranging location-sensitive peer-to-peer storage is one challenge. Sensor data sets do not have obvious names, so naming them in a globally useful fashion is another challenge. The last challenge arises from the need to index these sensor data sets to make them searchable. The key to sensor data identity is provenance, the full history or lineage of the data. We show how provenance addresses the naming and indexing issues and then present a
research agenda for constructing distributed, indexed repositories of sensor data.Engineering and Applied Science
Provenance-Aware Sensor Data Storage
Sensor network data has both historical and realtime value. Making historical sensor data useful, in particular, requires storage, naming, and indexing. Sensor data presents new challenges in these areas. Such data is location-specific but also distributed; it is collected in a particular physical location and may be most useful there, but it has additional value when combined with other sensor data collections in a larger distributed system. Thus, arranging location-sensitive peer-to-peer storage is one challenge. Sensor data sets do not have obvious names, so naming them in a globally useful fashion is another challenge. The last challenge arises from the need to index these sensor data sets to make them searchable. The key to sensor data identity is provenance, the full history or lineage of the data. We show how provenance addresses the naming and indexing issues and then present a
research agenda for constructing distributed, indexed repositories of sensor data.Engineering and Applied Science
- …