129 research outputs found

    Evaluation, Analysis and adaptation of web prefetching techniques in current web

    Full text link
    Abstract This dissertation is focused on the study of the prefetching technique applied to the World Wide Web. This technique lies in processing (e.g., downloading) a Web request before the user actually makes it. By doing so, the waiting time perceived by the user can be reduced, which is the main goal of the Web prefetching techniques. The study of the state of the art about Web prefetching showed the heterogeneity that exists in its performance evaluation. This heterogeneity is mainly focused on four issues: i) there was no open framework to simulate and evaluate the already proposed prefetching techniques; ii) no uniform selection of the performance indexes to be maximized, or even their definition; iii) no comparative studies of prediction algorithms taking into account the costs and benefits of web prefetching at the same time; and iv) the evaluation of techniques under very different or few significant workloads. During the research work, we have contributed to homogenizing the evaluation of prefetching performance by developing an open simulation framework that reproduces in detail all the aspects that impact on prefetching performance. In addition, prefetching performance metrics have been analyzed in order to clarify their definition and detect the most meaningful from the user's point of view. We also proposed an evaluation methodology to consider the cost and the benefit of prefetching at the same time. Finally, the importance of using current workloads to evaluate prefetching techniques has been highlighted; otherwise wrong conclusions could be achieved. The potential benefits of each web prefetching architecture were analyzed, finding that collaborative predictors could reduce almost all the latency perceived by users. The first step to develop a collaborative predictor is to make predictions at the server, so this thesis is focused on an architecture with a server-located predictor. The environment conditions that can be found in the web are alsDoménech I De Soria, J. (2007). Evaluation, Analysis and adaptation of web prefetching techniques in current web [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1841Palanci

    Types for DSP Assembler Programs

    Get PDF

    Measurement of Triple-Differential Z+Jet Cross Sections with the CMS Detector at 13 TeV and Modelling of Large-Scale Distributed Computing Systems

    Get PDF
    The achievable precision in the calculations of predictions for observables measured at the LHC experiments depends on the amount of invested computing power and the precision of input parameters that go into the calculation. Currently, no theory exists that can derive the input parameter values for perturbative calculations from first principles. Instead, they have to be derived from measurements in dedicated analyses that measure observables sensitive to the input parameters with high precision. Such an analysis that measures the production cross section of oppositely charged muon pairs with an invariant mass close to the mass of the Z\mathrm{Z} boson in association with jets in a phase space divided into bins of the transverse momentum of the dimuon system pTZp_T^\text{Z}, and two observables y∗y^* and yby_b created from the rapidities of the dimuon system and the jet with the highest momentum is presented. To achieve the highest statistical precision in this triple-differential measurement the full data recorded by the CMS experiment at a center-of-mass energy of s=13 TeV\sqrt{s}=13\,\mathrm{TeV} in the years 2016 to 2018 is combined. The measured cross sections are compared to theoretical predictions approximating full NNLO accuracy in perturbative QCD. Deviations from these predictions are observed rendering further studies at full NNLO accuracy necessary. To obtain the measured results large amounts of data are processed and analysed on distributed computing infrastructures. Theoretical calculations pose similar computing demands. Consequently, substantial amounts of storage and processing resources are required by the LHC collaborations. These requirements are met in large parts by the resources of the WLCG, a complex federation of globally distributed computer centres. With the upgrade of the LHC and the experiments, in the HL-LHC era, the computing demands are expected to increase substantially. Therefore, the prevailing computing models need to be updated to cope with the unprecedented demands. For the design of future adaptions of the HEP workflow executions on infrastructures a simulation model is developed, and an implementation tested on infrastructure design candidates inspired by a proposal of the German HEP computing community. The presented study of these infrastructure candidates showcases the applicability of the simulation tool in the strategical development of a future computing infrastructure for HEP in the HL-LHC context

    Compilation techniques for irregular problems on parallel machines

    Get PDF
    Massively parallel computers have ushered in the era of teraflop computing. Even though large and powerful machines are being built, they are used by only a fraction of the computing community. The fundamental reason for this situation is that parallel machines are difficult to program. Development of compilers that automatically parallelize programs will greatly increase the use of these machines.;A large class of scientific problems can be categorized as irregular computations. In this class of computation, the data access patterns are known only at runtime, creating significant difficulties for a parallelizing compiler to generate efficient parallel codes. Some compilers with very limited abilities to parallelize simple irregular computations exist, but the methods used by these compilers fail for any non-trivial applications code.;This research presents development of compiler transformation techniques that can be used to effectively parallelize an important class of irregular programs. A central aim of these transformation techniques is to generate codes that aggressively prefetch data. Program slicing methods are used as a part of the code generation process. In this approach, a program written in a data-parallel language, such as HPF, is transformed so that it can be executed on a distributed memory machine. An efficient compiler runtime support system has been developed that performs data movement and software caching

    Efficient openMP over sequentially consistent distributed shared memory systems

    Get PDF
    Nowadays clusters are one of the most used platforms in High Performance Computing and most programmers use the Message Passing Interface (MPI) library to program their applications in these distributed platforms getting their maximum performance, although it is a complex task. On the other side, OpenMP has been established as the de facto standard to program applications on shared memory platforms because it is easy to use and obtains good performance without too much effort. So, could it be possible to join both worlds? Could programmers use the easiness of OpenMP in distributed platforms? A lot of researchers think so. And one of the developed ideas is the distributed shared memory (DSM), a software layer on top of a distributed platform giving an abstract shared memory view to the applications. Even though it seems a good solution it also has some inconveniences. The memory coherence between the nodes in the platform is difficult to maintain (complex management, scalability issues, high overhead and others) and the latency of the remote-memory accesses which can be orders of magnitude greater than on a shared bus due to the interconnection network. Therefore this research improves the performance of OpenMP applications being executed on distributed memory platforms using a DSM with sequential consistency evaluating thoroughly the results from the NAS parallel benchmarks. The vast majority of designed DSMs use a relaxed consistency model because it avoids some major problems in the area. In contrast, we use a sequential consistency model because we think that showing these potential problems that otherwise are hidden may allow the finding of some solutions and, therefore, apply them to both models. The main idea behind this work is that both runtimes, the OpenMP and the DSM layer, should cooperate to achieve good performance, otherwise they interfere one each other trashing the final performance of applications. We develop three different contributions to improve the performance of these applications: (a) a technique to avoid false sharing at runtime, (b) a technique to mimic the MPI behaviour, where produced data is forwarded to their consumers and, finally, (c) a mechanism to avoid the network congestion due to the DSM coherence messages. The NAS Parallel Benchmarks are used to test the contributions. The results of this work shows that the false-sharing problem is a relative problem depending on each application. Another result is the importance to move the data flow outside of the critical path and to use techniques that forwards data as early as possible, similar to MPI, benefits the final application performance. Additionally, this data movement is usually concentrated at single points and affects the application performance due to the limited bandwidth of the network. Therefore it is necessary to provide mechanisms that allows the distribution of this data through the computation time using an otherwise idle network. Finally, results shows that the proposed contributions improve the performance of OpenMP applications on this kind of environments
    • …
    corecore