Search CORE

2,445 research outputs found

Stocator: A High Performance Object Store Connector for Spark

Author: Vernik Gil
Factor Michael
Kolodner Elliot K.
Michiardi Pietro
Ofer Effi
Pace Francesco
Publication venue
Publication date: 08/08/2017
Field of study

We present Stocator, a high performance object store connector for Apache Spark, that takes advantage of object store semantics. Previous connectors have assumed file system semantics, in particular, achieving fault tolerance and allowing speculative execution by creating temporary files to avoid interference between worker threads executing the same task and then renaming these files. Rename is not a native object store operation; not only is it not atomic, but it is implemented using a costly copy operation and a delete. Instead our connector leverages the inherent atomicity of object creation, and by avoiding the rename paradigm it greatly decreases the number of operations on the object store as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object stores. We have implemented Stocator and shared it in open source. Performance testing shows that it is as much as 18 times faster for write intensive workloads and performs as much as 30 times fewer operations on the object store than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider

arXiv.org e-Print Archive

FigShare

Determining WWW User's Next Access and Its Application to Pre-fetching

Author: Cunha Carlos R.
Jaccoud Carlos F.B.
Publication venue: Boston University Computer Science Department
Publication date: 26/03/1997
Field of study

World-Wide Web (WWW) services have grown to levels where significant delays are expected to happen. Techniques like pre-fetching are likely to help users to personalize their needs, reducing their waiting times. However, pre-fetching is only effective if the right documents are identified and if user's move is correctly predicted. Otherwise, pre-fetching will only waste bandwidth. Therefore, it is productive to determine whether a revisit will occur or not, before starting pre-fetching. In this paper we develop two user models that help determining user's next move. One model uses Random Walk approximation and the other is based on Digital Signal Processing techniques. We also give hints on how to use such models with a simple pre-fetching technique that we are developing.CNP

Boston University Institutional Repository (OpenBU)

On the Intrinsic Locality Properties of Web Reference Streams

Author: Abrahão Bruno
Almeida Virgílio
Crovella Mark
Fonseca Rodrigo
Publication venue: Boston University Computer Science Department
Publication date: 13/08/2002
Field of study

There has been considerable work done in the study of Web reference streams: sequences of requests for Web objects. In particular, many studies have looked at the locality properties of such streams, because of the impact of locality on the design and performance of caching and prefetching systems. However, a general framework for understanding why reference streams exhibit given locality properties has not yet emerged. In this work we take a first step in this direction, based on viewing the Web as a set of reference streams that are transformed by Web components (clients, servers, and intermediaries). We propose a graph-based framework for describing this collection of streams and components. We identify three basic stream transformations that occur at nodes of the graph: aggregation, disaggregation and filtering, and we show how these transformations can be used to abstract the effects of different Web components on their associated reference streams. This view allows a structured approach to the analysis of why reference streams show given properties at different points in the Web. Applying this approach to the study of locality requires good metrics for locality. These metrics must meet three criteria: 1) they must accurately capture temporal locality; 2) they must be independent of trace artifacts such as trace length; and 3) they must not involve manual procedures or model-based assumptions. We describe two metrics meeting these criteria that each capture a different kind of temporal locality in reference streams. The popularity component of temporal locality is captured by entropy, while the correlation component is captured by interreference coefficient of variation. We argue that these metrics are more natural and more useful than previously proposed metrics for temporal locality. We use this framework to analyze a diverse set of Web reference traces. We find that this framework can shed light on how and why locality properties vary across different locations in the Web topology. For example, we find that filtering and aggregation have opposing effects on the popularity component of the temporal locality, which helps to explain why multilevel caching can be effective in the Web. Furthermore, we find that all transformations tend to diminish the correlation component of temporal locality, which has implications for the utility of different cache replacement policies at different points in the Web.National Science Foundation (ANI-9986397, ANI-0095988); CNPq-Brazi

Boston University Institutional Repository (OpenBU)

Internet economics and policy: An Australian perspective

Author: Coble-Neal Grant
Madden Gary G
Publication venue
Publication date
Field of study

Publicly available information indicates that the demand and supply of Internet and Internet-related services are continuing to expand at a rapid pace. Since 1997 the number of Internet service providers (facilities-based and resellers) has increased by nearly 40 per cent; the number of points-of-presence per Internet service provider has increased by five times; the number of hosts connected to the Internet has more than quadrupled; and Internet traffic has increased from six to 10 times. The emergence of electronic commerce (e-commerce), driven by this rapid adoption of Internet services and continual technological innovation, is likely to have profound economic and social impacts on Australian society. This paper provides a detailed analysis of the impact of the Internet and e-commerce, ranging from the changes in the market structure of the telecommunications industry, its role in changing the organisation of traditional markets, the emergence of new markets, and the structural shifts to employment, productivity and trade. The paper also analyses contemporary Australian regulatory responses. IIe-commerce; internet economics

The Network Effects of Prefetching

Author: Crovella Mark
Barford Paul
Publication venue: Boston University Computer Science Department
Publication date: 01/01/1997
Field of study

Prefetching has been shown to be an effective technique for reducing user perceived latency in distributed systems. In this paper we show that even when prefetching adds no extra traffic to the network, it can have serious negative performance effects. Straightforward approaches to prefetching increase the burstiness of individual sources, leading to increased average queue sizes in network switches. However, we also show that applications can avoid the undesirable queueing effects of prefetching. In fact, we show that applications employing prefetching can significantly improve network performance, to a level much better than that obtained without any prefetching at all. This is because prefetching offers increased opportunities for traffic shaping that are not available in the absence of prefetching. Using a simple transport rate control mechanism, a prefetching application can modify its behavior from a distinctly ON/OFF entity to one whose data transfer rate changes less abruptly, while still delivering all data in advance of the user's actual requests

Boston University Institutional Repository (OpenBU)