438 research outputs found

    Online Modeling and Tuning of Parallel Stream Processing Systems

    Get PDF
    Writing performant computer programs is hard. Code for high performance applications is profiled, tweaked, and re-factored for months specifically for the hardware for which it is to run. Consumer application code doesn\u27t get the benefit of endless massaging that benefits high performance code, even though heterogeneous processor environments are beginning to resemble those in more performance oriented arenas. This thesis offers a path to performant, parallel code (through stream processing) which is tuned online and automatically adapts to the environment it is given. This approach has the potential to reduce the tuning costs associated with high performance code and brings the benefit of performance tuning to consumer applications where otherwise it would be cost prohibitive. This thesis introduces a stream processing library and multiple techniques to enable its online modeling and tuning. Stream processing (also termed data-flow programming) is a compute paradigm that views an application as a set of logical kernels connected via communications links or streams. Stream processing is increasingly used by computational-x and x-informatics fields (e.g., biology, astrophysics) where the focus is on safe and fast parallelization of specific big-data applications. A major advantage of stream processing is that it enables parallelization without necessitating manual end-user management of non-deterministic behavior often characteristic of more traditional parallel processing methods. Many big-data and high performance applications involve high throughput processing, necessitating usage of many parallel compute kernels on several compute cores. Optimizing the orchestration of kernels has been the focus of much theoretical and empirical modeling work. Purely theoretical parallel programming models can fail when the assumptions implicit within the model are mis-matched with reality (i.e., the model is incorrectly applied). Often it is unclear if the assumptions are actually being met, even when verified under controlled conditions. Full empirical optimization solves this problem by extensively searching the range of likely configurations under native operating conditions. This, however, is expensive in both time and energy. For large, massively parallel systems, even deciding which modeling paradigm to use is often prohibitively expensive and unfortunately transient (with workload and hardware). In an ideal world, a parallel run-time will re-optimize an application continuously to match its environment, with little additional overhead. This work presents methods aimed at doing just that through low overhead instrumentation, modeling, and optimization. Online optimization provides a good trade-off between static optimization and online heuristics. To enable online optimization, modeling decisions must be fast and relatively accurate. Online modeling and optimization of a stream processing system first requires the existence of a stream processing framework that is amenable to the intended type of dynamic manipulation. To fill this void, we developed the RaftLib C++ template library, which enables usage of the stream processing paradigm for C++ applications (it is the run-time which is the basis of almost all the work within this dissertation). An application topology is specified by the user, however almost everything else is optimizable by the run-time. RaftLib takes advantage of the knowledge gained during the design of several prior streaming languages (notably Auto-Pipe). The resultant framework enables online migration of tasks, auto-parallelization, online buffer-reallocation, and other useful dynamic behaviors that were not available in many previous stream processing systems. Several benchmark applications have been designed to assess the performance gains through our approaches and compare performance to other leading stream processing frameworks. Information is essential to any modeling task, to that end a low-overhead instrumentation framework has been developed which is both dynamic and adaptive. Discovering a fast and relatively optimal configuration for a stream processing application often necessitates solving for buffer sizes within a finite capacity queueing network. We show that a generalized gain/loss network flow model can bootstrap the process under certain conditions. Any modeling effort, requires that a model be selected; often a highly manual task, involving many expensive operations. This dissertation demonstrates that machine learning methods (such as a support vector machine) can successfully select models at run-time for a streaming application. The full set of approaches are incorporated into the open source RaftLib framework

    Detection and Mitigation of Impairments for Real-Time Multimedia Applications

    Get PDF
    Measures of Quality of Service (QoS) for multimedia services should focus on phenomena that are observable to the end-user. Metrics such as delay and loss may have little direct meaning to the end-user because knowledge of specific coding and/or adaptive techniques is required to translate delay and loss to the user-perceived performance. Impairment events, as defined in this dissertation, are observable by the end-users independent of coding, adaptive playout or packet loss concealment techniques employed by their multimedia applications. Methods for detecting real-time multimedia (RTM) impairment events from end-to-end measurements are developed here and evaluated using 26 days of PlanetLab measurements collected over nine different Internet paths. Furthermore, methods for detecting impairment-causing network events like route changes and congestion are also developed. The advanced detection techniques developed in this work can be used by applications to detect and match response to network events. The heuristics-based techniques for detecting congestion and route changes were evaluated using PlanetLab measurements. It was found that Congestion events occurred for 6-8 hours during the days on weekdays on two paths. The heuristics-based route change detection algorithm detected 71\% of the visible layer 2 route changes and did not detect the events that occurred too close together in time or the events for which the minimum RTT change was small. A practical model-based route change detector named the parameter unaware detector (PUD) is also developed in this deissertation because it was expected that model-based detectors would perform better than the heuristics-based detector. Also, the optimal detector named the parameter aware detector (PAD) is developed and is useful because it provides the upper bound on the performance of any detector. The analysis for predicting the performance of PAD is another important contribution of this work. Simulation results prove that the model-based PUD algorithm has acceptable performance over a larger region of the parameter space than the heuristics-based algorithm and this difference in performance increases with an increase in the window size. Also, it is shown that both practical algorithms have a smaller acceptable performance region compared to the optimal algorithm. The model-based algorithms proposed in this dissertation are based on the assumption that RTTs have a Gamma density function. This Gamma distribution assumption may not hold when there are wireless links in the path. A study of CDMA 1xEVDO networks was initiated to understand the delay characteristics of these networks. During this study, it was found that the widely deployed proportional-fair (PF) scheduler can be corrupted accidentally or deliberately to cause RTM impairments. This is demonstrated using measurements conducted over both in-lab and deployed CDMA 1xEVDO networks. A new variant to PF that solves the impairment vulnerability of the PF algorithm is proposed and evaluated using ns-2 simulations. It is shown that this new scheduler solution together with a new adaptive-alpha initialization stratergy reduces the starvation problem of the PF algorithm

    Modelling and performability evaluation of Wireless Sensor Networks

    Get PDF
    This thesis presents generic analytical models of homogeneous clustered Wireless Sensor Networks (WSNs) with a centrally located Cluster Head (CH) coordinating cluster communication with the sink directly or through other intermediate nodes. The focus is to integrate performance and availability studies of WSNs in the presence of sensor nodes and channel failures and repair/replacement. The main purpose is to enhance improvement of WSN Quality of Service (QoS). Other research works also considered in this thesis include modelling of packet arrival distribution at the CH and intermediate nodes, and modelling of energy consumption at the sensor nodes. An investigation and critical analysis of wireless sensor network architectures, energy conservation techniques and QoS requirements are performed in order to improve performance and availability of the network. Existing techniques used for performance evaluation of single and multi-server systems with several operative states are investigated and analysed in details. To begin with, existing approaches for independent (pure) performance modelling are critically analysed with highlights on merits and drawbacks. Similarly, pure availability modelling approaches are also analysed. Considering that pure performance models tend to be too optimistic and pure availability models are too conservative, performability, which is the integration of performance and availability studies is used for the evaluation of the WSN models developed in this study. Two-dimensional Markov state space representations of the systems are used for performability modelling. Following critical analysis of the existing solution techniques, spectral expansion method and system of simultaneous linear equations are developed and used to solving the proposed models. To validate the results obtained with the two techniques, a discrete event simulation tool is explored. In this research, open queuing networks are used to model the behaviour of the CH when subjected to streams of traffic from cluster nodes in addition to dynamics of operating in the various states. The research begins with a model of a CH with an infinite queue capacity subject to failures and repair/replacement. The model is developed progressively to consider bounded queue capacity systems, channel failures and sleep scheduling mechanisms for performability evaluation of WSNs. Using the developed models, various performance measures of the considered system including mean queue length, throughput, response time and blocking probability are evaluated. Finally, energy models considering mean power consumption in each of the possible operative states is developed. The resulting models are in turn employed for the evaluation of energy saving for the proposed case study model. Numerical solutions and discussions are presented for all the queuing models developed. Simulation is also performed in order to validate the accuracy of the results obtained. In order to address issues of performance and availability of WSNs, current research present independent performance and availability studies. The concerns resulting from such studies have therefore remained unresolved over the years hence persistence poor system performance. The novelty of this research is a proposed integrated performance and availability modelling approach for WSNs meant to address challenges of independent studies. In addition, a novel methodology for modelling and evaluation of power consumption is also offered. Proposed model results provide remarkable improvement on system performance and availability in addition to providing tools for further optimisation studies. A significant power saving is also observed from the proposed model results. In order to improve QoS for WSN, it is possible to improve the proposed models by incorporating priority queuing in a mixed traffic environment. A model of multi-server system is also appropriate for addressing traffic routing. It is also possible to extend the proposed energy model to consider other sleep scheduling mechanisms other than On-demand proposed herein. Analysis and classification of possible arrival distribution of WSN packets for various application environments would be a great idea for enabling robust scientific research

    Detection and Mitigation of Impairments for Real-Time Multimedia Applications

    Get PDF
    Measures of Quality of Service (QoS) for multimedia services should focus on phenomena that are observable to the end-user. Metrics such as delay and loss may have little direct meaning to the end-user because knowledge of specific coding and/or adaptive techniques is required to translate delay and loss to the user-perceived performance. Impairment events, as defined in this dissertation, are observable by the end-users independent of coding, adaptive playout or packet loss concealment techniques employed by their multimedia applications. Methods for detecting real-time multimedia (RTM) impairment events from end-to-end measurements are developed here and evaluated using 26 days of PlanetLab measurements collected over nine different Internet paths. Furthermore, methods for detecting impairment-causing network events like route changes and congestion are also developed. The advanced detection techniques developed in this work can be used by applications to detect and match response to network events. The heuristics-based techniques for detecting congestion and route changes were evaluated using PlanetLab measurements. It was found that Congestion events occurred for 6-8 hours during the days on weekdays on two paths. The heuristics-based route change detection algorithm detected 71\% of the visible layer 2 route changes and did not detect the events that occurred too close together in time or the events for which the minimum RTT change was small. A practical model-based route change detector named the parameter unaware detector (PUD) is also developed in this deissertation because it was expected that model-based detectors would perform better than the heuristics-based detector. Also, the optimal detector named the parameter aware detector (PAD) is developed and is useful because it provides the upper bound on the performance of any detector. The analysis for predicting the performance of PAD is another important contribution of this work. Simulation results prove that the model-based PUD algorithm has acceptable performance over a larger region of the parameter space than the heuristics-based algorithm and this difference in performance increases with an increase in the window size. Also, it is shown that both practical algorithms have a smaller acceptable performance region compared to the optimal algorithm. The model-based algorithms proposed in this dissertation are based on the assumption that RTTs have a Gamma density function. This Gamma distribution assumption may not hold when there are wireless links in the path. A study of CDMA 1xEVDO networks was initiated to understand the delay characteristics of these networks. During this study, it was found that the widely deployed proportional-fair (PF) scheduler can be corrupted accidentally or deliberately to cause RTM impairments. This is demonstrated using measurements conducted over both in-lab and deployed CDMA 1xEVDO networks. A new variant to PF that solves the impairment vulnerability of the PF algorithm is proposed and evaluated using ns-2 simulations. It is shown that this new scheduler solution together with a new adaptive-alpha initialization stratergy reduces the starvation problem of the PF algorithm

    Resource dimensioning in a mixed traffic environment

    Get PDF
    An important goal of modern data networks is to support multiple applications over a single network infrastructure. The combination of data, voice, video and conference traffic, each requiring a unique Quality of Service (QoS), makes resource dimensioning a very challenging task. To guarantee QoS by mere over-provisioning of bandwidth is not viable in the long run, as network resources are expensive. The aim of proper resource dimensioning is to provide the required QoS while making optimal use of the allocated bandwidth. Dimensioning parameters used by service providers today are based on best practice recommendations, and are not necessarily optimal. This dissertation focuses on resource dimensioning for the DiffServ network architecture. Four predefined traffic classes, i.e. Real Time (RT), Interactive Business (IB), Bulk Business (BB) and General Data (GD), needed to be dimensioned in terms of bandwidth allocation and traffic regulation. To perform this task, a study was made of the DiffServ mechanism and the QoS requirements of each class. Traffic generators were required for each class to perform simulations. Our investigations show that the dominating Transport Layer protocol for the RT class is UDP, while TCP is mostly used by the other classes. This led to a separate analysis and requirement for traffic models for UDP and TCP traffic. Analysis of real-world data shows that modern network traffic is characterized by long-range dependency, self-similarity and a very bursty nature. Our evaluation of various traffic models indicates that the Multi-fractal Wavelet Model (MWM) is best for TCP due to its ability to capture long-range dependency and self-similarity. The Markov Modulated Poisson Process (MMPP) is able to model occasional long OFF-periods and burstiness present in UDP traffic. Hence, these two models were used in simulations. A test bed was implemented to evaluate performance of the four traffic classes defined in DiffServ. Traffic was sent through the test bed, while delay and loss was measured. For single class simulations, dimensioning values were obtained while conforming to the QoS specifications. Multi-class simulations investigated the effects of statistical multiplexing on the obtained values. Simulation results for various numerical provisioning factors (PF) were obtained. These factors are used to determine the link data rate as a function of the required average bandwidth and QoS. The use of class-based differentiation for QoS showed that strict delay and loss bounds can be guaranteed, even in the presence of very high (up to 90%) bandwidth utilization. Simulation results showed small deviations from best practice recommendation PF values: A value of 4 is currently used for both RT and IB classes, while 2 is used for the BB class. This dissertation indicates that 3.89 for RT, 3.81 for IB and 2.48 for BB achieve the prescribed QoS more accurately. It was concluded that either the bandwidth distribution among classes, or quality guarantees for the BB class should be adjusted since the RT and IB classes over-performed while BB under-performed. The results contribute to the process of resource dimensioning by adding value to dimensioning parameters through simulation rather than mere intuition or educated guessing.Dissertation (MEng (Electronic Engineering))--University of Pretoria, 2007.Electrical, Electronic and Computer Engineeringunrestricte

    HRM: merging hardware event monitors for improved timing analysis of complex MPSoCs

    Get PDF
    The Performance Monitoring Unit (PMU) in MPSoCs is at the heart of the latest measurement-based timing analysis techniques in Critical Embedded Systems. In particular, hardware event monitors (HEMs) in the PMU are used as building blocks in the process of budgeting and verifying software timing by tracking and controlling access counts to shared resources. While the number of HEMs in current MPSoCs reaches hundreds, they are read via Performance Monitoring Counters whose number is usually limited to 4-8, thus requiring multiple runs of each experiment in order to collect all desired HEMs. Despite the effort of engineers in controlling the execution conditions of each experiment, the complexity of current MPSoCs makes it arguably impossible to completely remove the noise affecting each run. As a result, HEMs read in different runs are subject to different variability, and hence, those HEMs captured in different runs cannot be ‘blindly’ merged. In this work, we focus on the NXP T2080 platform where we observed up to 59% variability across different runs of the same experiment for some relevant HEMs (e.g. processor cycles). We develop a HEM reading and merging (HRM) approach to join reliably HEMs across different runs as a fundamental element of any measurement-based timing budgeting and verification technique. Our method builds on order statistics and the selection of an anchor HEM read in all runs to derive the most plausible combination of HEM readings that keep the distribution of each HEM and their relationship with the anchor HEM intact.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773) and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    On packet switch design

    Get PDF

    A PC-based data acquisition system for sub-atomic physics measurements

    Get PDF
    Modern particle physics measurements are heavily dependent upon automated data acquisition systems (DAQ) to collect and process experiment-generated information. One research group from the University of Saskatchewan utilizes a DAQ known as the Lucid data acquisition and analysis system. This thesis examines the project undertaken to upgrade the hardware and software components of Lucid. To establish the effectiveness of the system upgrades, several performance metrics were obtained including the system's dead time and input/output bandwidth.Hardware upgrades to Lucid consisted of replacing its aging digitization equipment with modern, faster-converting Versa-Module Eurobus (VME) technology and replacing the instrumentation processing platform with common, PC hardware. The new processor platform is coupled to the instrumentation modules via a fiber-optic bridging-device, the sis1100/3100 from Struck Innovative Systems.The software systems of Lucid were also modified to follow suit with the new hardware. Originally constructed to utilize a proprietary real-time operating system, the data acquisition application was ported to run under the freely available Real-Time Executive for Multiprocessor Systems (RTEMS). The device driver software provided with sis1100/3100 interface also had to be ported for use under the RTEMS-based system. Performance measurements of the upgraded DAQ indicate that the dead time has been reduced from being on the order of milliseconds to being on the order of several tens of microseconds. This increased capability means that Lucid's users may acquire significantly more data in a shorter period of time, thereby decreasing both the statistical uncertainties and data collection duration associated with a given experiment

    Final report on the evaluation of RRM/CRRM algorithms

    Get PDF
    Deliverable public del projecte EVERESTThis deliverable provides a definition and a complete evaluation of the RRM/CRRM algorithms selected in D11 and D15, and evolved and refined on an iterative process. The evaluation will be carried out by means of simulations using the simulators provided at D07, and D14.Preprin
    • …
    corecore