320 research outputs found
Optimization inWeb Caching: Cache Management, Capacity Planning, and Content Naming
Caching is fundamental to performance in distributed information retrieval systems
such as the World Wide Web. This thesis introduces novel techniques for optimizing performance
and cost-effectiveness in Web cache hierarchies.
When requests are served by nearby caches rather than distant servers, server loads and
network traffic decrease and transactions are faster. Cache system design and management,
however, face extraordinary challenges in loosely-organized environments like the Web,
where the many components involved in content creation, transport, and consumption are
owned and administered by different entities. Such environments call for decentralized
algorithms in which stakeholders act on local information and private preferences.
In this thesis I consider problems of optimally designing new Web cache hierarchies
and optimizing existing ones. The methods I introduce span the Web from point of content
creation to point of consumption: I quantify the impact of content-naming practices on
cache performance; present techniques for variable-quality-of-service cache management;
describe how a decentralized algorithm can compute economically-optimal cache sizes in
a branching two-level cache hierarchy; and introduce a new protocol extension that eliminates
redundant data transfers and allows âdynamicâ content to be cached consistently.
To evaluate several of my new methods, I conducted trace-driven simulations on an
unprecedented scale. This in turn required novel workload measurement methods and efficient
new characterization and simulation techniques. The performance benefits of my proposed
protocol extension are evaluated using two extraordinarily large and detailed workload
traces collected in a traditional corporate network environment and an unconventional
thin-client system.
My empirical research follows a simple but powerful paradigm: measure on a large
scale an important production environmentâs exogenous workload; identify performance
bounds inherent in the workload, independent of the system currently serving it; identify
gaps between actual and potential performance in the environment under study; and finally
devise ways to close these gaps through component modifications or through improved
inter-component integration. This approach may be applicable to a wide range of Web
services as they mature.Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/1/kelly-optimization_web_caching.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/2/kelly-optimization_web_caching.ps.bz
Survey of Transportation of Adaptive Multimedia Streaming service in Internet
[DE] World Wide Web is the greatest boon towards the technological advancement of modern era. Using the benefits of Internet globally, anywhere and anytime, users can avail the benefits of accessing live and on demand video services. The streaming media systems such as YouTube, Netflix, and Apple Music are reining the multimedia world with frequent popularity among users. A key concern of quality perceived for video streaming applications over Internet is the Quality of Experience (QoE) that users go through. Due to changing network conditions, bit rate and initial delay and the multimedia file freezes or provide poor video quality to the end users, researchers across industry and academia are explored HTTP Adaptive Streaming (HAS), which split the video content into multiple segments and offer the clients at varying qualities. The video player at the client side plays a vital role in buffer management and choosing the appropriate bit rate for each such segment of video to be transmitted. A higher bit rate transmitted video pauses in between whereas, a lower bit rate video lacks in quality, requiring a tradeoff between them. The need of the hour was to adaptively varying the bit rate and video quality to match the transmission media conditions. Further, The main aim of this paper is to give an overview on the state of the art HAS techniques across multimedia and networking domains. A detailed survey was conducted to analyze challenges and solutions in adaptive streaming algorithms, QoE, network protocols, buffering and etc. It also focuses on various challenges on QoE influence factors in a fluctuating network condition, which are often ignored in present HAS methodologies. Furthermore, this survey will enable network and multimedia researchers a fair amount of understanding about the latest happenings of adaptive streaming and the necessary improvements that can be incorporated in future developments.Abdullah, MTA.; Lloret, J.; Canovas Solbes, A.; GarcĂa-GarcĂa, L. (2017). Survey of Transportation of Adaptive Multimedia Streaming service in Internet. Network Protocols and Algorithms. 9(1-2):85-125. doi:10.5296/npa.v9i1-2.12412S8512591-
Distributed Dynamic Replica Placement and Request Redirection in Content Delivery Networks
Chiara Petrioli, Giancarlo Bongiovanni, Francesco Lo Prest
Incremental volume rendering using hierarchical compression
Includes bibliographical references.The research has been based on the thesis that efficient volume rendering of datasets, contained on the Internet, can be achieved on average personal workstations. We present a new algorithm here for efficient incremental rendering of volumetric datasets. The primary goal of this algorithm is to give average workstations the ability to efficiently render volume data received over relatively low bandwidth network links in such a way that rapid user feedback is maintained. Common limitations of workstation rendering of volume data include: large memory overheads, the requirement of expensive rendering hardware, and high speed processing ability. The rendering algorithm presented here overcomes these problems by making use of the efficient Shear-Warp Factorisation method which does not require specialised graphics hardware. However the original Shear-Warp algorithm suffers from a high memory overhead and does not provide for incremental rendering which is required should rapid user feedback be maintained. Our algorithm represents the volumetric data using a hierarchical data structure which provides for the incremental classification and rendering of volume data. This exploits the multiscale nature of the octree data structure. The algorithm reduces the memory footprint of the original Shear-Warp Factorisation algorithm by a factor of more than two, while maintaining good rendering performance. These factors make our octree algorithm more suitable for implementation on average desktop workstations for the purposes of interactive exploration of volume models over a network. This dissertation covers the theory and practice of developing the octree based Shear-Warp algorithms, and then presents the results of extensive empirical testing. The results, using typical volume datasets, demonstrate the ability of the algorithm to achieve high rendering rates for both incremental rendering and standard rendering while reducing the runtime memory requirements
Equivalence classes for named function networking
Named Function Networking (NFN) is a generalization of Content-Centric Networking (CCN) and Named Data Networking (NDN). Beyond mere content retrieval, NFN enables to ask for results of computations. Names are not just content identifiers but λ-expressions that allow an arbitrary composition of function calls and data accesses. λ-expressions are pure and deterministic. In other words, they do not have side effects and they always yield the same result. Both properties together are known to as referential transparency. Referentially transparent functions can be evaluated individually no matter where and in what order, e.g. geographically distributed and concurrently. This simplifies the distribution of computations in a network, an attractive feature in times of rising needs for edge computing. However, NFN is affected by a lacking awareness for referentially opaque expressions that are characterized by having changing results or side effects, i.e. expressions that depend on outer conditions or modify outer states.
The fundamental motivation of this thesis is to retrofit NFN with a clearer notion of referentially opaque expressions. They are indispensable not only to many common use cases such as e-mail and database applications, but also to network technologies such as software defined networking. We observed that many protocol decisions are based on expression matching, i.e. the search for equivalent expressions. Driven by this observation, this thesis explores possibilities to adapt the determination of equivalences in dependence of crucial expression properties such as their ability for aggregation, concurrent evaluation or permanently cacheable results. This exploration results in a comprehensive set of equivalence classes that is used for explicit attribution of expressions, leading to a system that is aware of the true nature of handled expressions. Moreover, we deliver a solution to support referentially opaque expressions and mutable states in an architecture that bases upon uniquely named and immutable data packets.
Altogether, the findings condense to an extended execution model. It summarizes how the attribution of expressions with equivalence classes influences specific protocol decisions in order to support referentially transparent as well as referentially opaque expressions. We believe that our approach captivates due to its generality and extensibility. Equivalence classes depend upon universal properties. Therefore, our approach is not bound to a specific elaboration like NFN. We evaluate the applicability of our approach in a few application scenarios. Overall, the proposed solutions and concepts are an important contribution towards name-based distributed computations in information-centric networks
Energy Measurements of High Performance Computing Systems: From Instrumentation to Analysis
Energy efficiency is a major criterion for computing in general and High Performance Computing in particular. When optimizing for energy efficiency, it is essential to measure the underlying metric: energy consumption. To fully leverage energy measurements, their quality needs to be well-understood. To that end, this thesis provides a rigorous evaluation of various energy measurement techniques. I demonstrate how the deliberate selection of instrumentation points, sensors, and analog processing schemes can enhance the temporal and spatial resolution while preserving a well-known accuracy. Further, I evaluate a scalable energy measurement solution for production HPC systems and address its shortcomings.
Such high-resolution and large-scale measurements present challenges regarding the management of large volumes of generated metric data. I address these challenges with a scalable infrastructure for collecting, storing, and analyzing metric data. With this infrastructure, I also introduce a novel persistent storage scheme for metric time series data, which allows efficient queries for aggregate timelines.
To ensure that it satisfies the demanding requirements for scalable power measurements, I conduct an extensive performance evaluation and describe a productive deployment of the infrastructure.
Finally, I describe different approaches and practical examples of analyses based on energy measurement data. In particular, I focus on the combination of energy measurements and application performance traces. However, interweaving fine-grained power recordings and application events requires accurately synchronized timestamps on both sides. To overcome this obstacle, I develop a resilient and automated technique for time synchronization, which utilizes crosscorrelation of a specifically influenced power measurement signal. Ultimately, this careful combination of sophisticated energy measurements and application performance traces yields a detailed insight into application and system energy efficiency at full-scale HPC systems and down to millisecond-range regions.:1 Introduction
2 Background and Related Work
2.1 Basic Concepts of Energy Measurements
2.1.1 Basics of Metrology
2.1.2 Measuring Voltage, Current, and Power
2.1.3 Measurement Signal Conditioning and Analog-to-Digital Conversion
2.2 Power Measurements for Computing Systems
2.2.1 Measuring Compute Nodes using External Power Meters
2.2.2 Custom Solutions for Measuring Compute Node Power
2.2.3 Measurement Solutions of System Integrators
2.2.4 CPU Energy Counters
2.2.5 Using Models to Determine Energy Consumption
2.3 Processing of Power Measurement Data
2.3.1 Time Series Databases
2.3.2 Data Center Monitoring Systems
2.4 Influences on the Energy Consumption of Computing Systems
2.4.1 Processor Power Consumption Breakdown
2.4.2 Energy-Efficient Hardware Configuration
2.5 HPC Performance and Energy Analysis
2.5.1 Performance Analysis Techniques
2.5.2 HPC Performance Analysis Tools
2.5.3 Combining Application and Power Measurements
2.6 Conclusion
3 Evaluating and Improving Energy Measurements
3.1 Description of the Systems Under Test
3.2 Instrumentation Points and Measurement Sensors
3.2.1 Analog Measurement at Voltage Regulators
3.2.2 Instrumentation with Hall Effect Transducers
3.2.3 Modular Instrumentation of DC Consumers
3.2.4 Optimal Wiring for Shunt-Based Measurements
3.2.5 Node-Level Instrumentation for HPC Systems
3.3 Analog Signal Conditioning and Analog-to-Digital Conversion
3.3.1 Signal Amplification
3.3.2 Analog Filtering and Analog-To-Digital Conversion
3.3.3 Integrated Solutions for High-Resolution Measurement
3.4 Accuracy Evaluation and Calibration
3.4.1 Synthetic Workloads for Evaluating Power Measurements
3.4.2 Improving and Evaluating the Accuracy of a Single-Node Measuring System
3.4.3 Absolute Accuracy Evaluation of a Many-Node Measuring System
3.5 Evaluating Temporal Granularity and Energy Correctness
3.5.1 Measurement Signal Bandwidth at Different Instrumentation Points
3.5.2 Retaining Energy Correctness During Digital Processing
3.6 Evaluating CPU Energy Counters
3.6.1 Energy Readouts with RAPL
3.6.2 Methodology
3.6.3 RAPL on Intel Sandy Bridge-EP
3.6.4 RAPL on Intel Haswell-EP and Skylake-SP
3.7 Conclusion
4 A Scalable Infrastructure for Processing Power Measurement Data
4.1 Requirements for Power Measurement Data Processing
4.2 Concepts and Implementation of Measurement Data Management
4.2.1 Message-Based Communication between Agents
4.2.2 Protocols
4.2.3 Application Programming Interfaces
4.2.4 Efficient Metric Time Series Storage and Retrieval
4.2.5 Hierarchical Timeline Aggregation
4.3 Performance Evaluation
4.3.1 Benchmark Hardware Specifications
4.3.2 Throughput in Symmetric Configuration with Replication
4.3.3 Throughput with Many Data Sources and Single Consumers
4.3.4 Temporary Storage in Message Queues
4.3.5 Persistent Metric Time Series Request Performance
4.3.6 Performance Comparison with Contemporary Time Series Storage Solutions
4.3.7 Practical Usage of MetricQ
4.4 Conclusion
5 Energy Efficiency Analysis
5.1 General Energy Efficiency Analysis Scenarios
5.1.1 Live Visualization of Power Measurements
5.1.2 Visualization of Long-Term Measurements
5.1.3 Integration in Application Performance Traces
5.1.4 Graphical Analysis of Application Power Traces
5.2 Correlating Power Measurements with Application Events
5.2.1 Challenges for Time Synchronization of Power Measurements
5.2.2 Reliable Automatic Time Synchronization with Correlation Sequences
5.2.3 Creating a Correlation Signal on a Power Measurement Channel
5.2.4 Processing the Correlation Signal and Measured Power Values
5.2.5 Common Oversampling of the Correlation Signals at Different Rates
5.2.6 Evaluation of Correlation and Time Synchronization
5.3 Use Cases for Application Power Traces
5.3.1 Analyzing Complex Power Anomalies
5.3.2 Quantifying C-State Transitions
5.3.3 Measuring the Dynamic Power Consumption of HPC Applications
5.4 Conclusion
6 Summary and Outloo
An ontology enhanced parallel SVM for scalable spam filter training
This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart
Recommended from our members
Accurate modeling of core and memory locality for proxy generation targeting emerging applications and architectures
Designing optimal computer systems for improved performance and energy efficiency requires architects and designers to have a deep understanding of the end-user workloads. However, many end-users (e.g., large corporations, banks, defense organizations, etc.) are apprehensive to share their applications with designers due to the confidential nature of software code and data. In addition, emerging applications pose significant challenges to early design space exploration due to their long-running nature and the highly complex nature of their software stack that cannot be supported on many early performance models.
The above challenges can be overcome by using a proxy benchmark. A miniaturized proxy benchmark can be used as a substitute of the original workload to perform early computer performance evaluation. The process of generating a proxy benchmark consists of extracting a set of key statistics to summarize the behavior of end-user applications through profiling and using the collected statistics to synthesize a representative proxy benchmark. Using such proxy benchmarks can help designers to understand the behavior of end-userâs workloads in a reasonable time without the users having to disclose sensitive information about their workloads.
Prior proxy benchmarking schemes leverage micro-architecture independent metrics, derived from detailed simulation tools, to generate proxy benchmarks. However, many emerging workloads do not work reliably with many profiling or simulation tools, in which case it becomes impossible to apply prior proxy generation techniques to generate proxy benchmarks for such complex applications. Furthermore, these techniques model instruction pipeline-level locality in great detail, but abstract out memory locality modeling using simple stride-based models. This results in poor cloning accuracy especially for emerging applications, which have larger memory footprints and complex access patterns. A few detailed cache and memory locality modeling techniques have also been proposed in literature. However, these techniques either model limited locality metrics and suffer from poor cloning accuracy or are fairly accurate, but at the expense of significant metadata overhead. Finally, none of the prior proxy benchmarking techniques model both core and memory locality with high accuracy. As a result, they are not useful for studying system-level performance behavior. Keeping the above key limitations and shortcomings of prior work in mind, this dissertation presents several techniques that expand the frontiers of workload proxy benchmarking, thereby enabling computer designers to gain a better and faster understanding of end-user application behavior without compromising the privileged nature of software or data.
This dissertation first presents a core-level proxy benchmark generation methodology that leverages performance metrics derived from hardware performance counter measurements to create miniature proxy benchmarks targeting emerging big-data applications. The presented performance counter based characterization and associated extrapolation into generic parameters for proxy generation enables faster analysis (runs almost at native hardware speeds, unlike prior workload cloning proposals) and proxy generation for emerging applications that do not work with simulators or profiling tools. The generated proxy benchmarks are representative of the performance of the real-world big-data applications, including operating system and run-time effects, and yet converge to results quickly without needing any complex software stack support.
Next, to improve upon the accuracy and efficiency of prior memory proxy benchmarking techniques, this dissertation presents a novel memory locality modeling technique that leverages localized pattern detection to create miniature memory proxy benchmarks. The presented technique models memory reference locality by decomposing an applicationâs memory accesses into a set of independent streams (localized by using address region based localization property), tracking fine-grained patterns within the localized streams and, finally, chaining or interleaving accesses from different localized memory streams to create an ordered proxy memory access sequence. This dissertation further extends the workload cloning approach to Graphics Processing Units (GPUs) and presents a novel proxy generation methodology to model the inherent memory access locality of GPU applications, while also accounting for the GPUâs parallel execution model. The generated memory proxy benchmarks help to enable fast and efficient design space exploration of futuristic memory hierarchies.
Finally, this dissertation presents a novel technique to integrate accurate core and memory locality models to create system-level proxy benchmarks targeting emerging applications. This is a new capability that can facilitate efficient overall system (core, cache and memory subsystem) design-space exploration. This dissertation further presents a novel methodology that exploits the synthetic benchmark generation framework to create hypothetical workloads with performance behavior that does not currently exist. Such proxies can be generated to cover anticipated code trends and can represent futuristic workloads before the workloads even exist.Electrical and Computer Engineerin
- âŠ