1,215 research outputs found

    Speculative Validation of Web Objects for Further Reducing the User-Perceived Latency

    Full text link

    A taxonomy of web prediction algorithms

    Full text link
    Web prefetching techniques are an attractive solution to reduce the user-perceived latency. These techniques are driven by a prediction engine or algorithm that guesses following actions of web users. A large amount of prediction algorithms has been proposed since the first prefetching approach was published, although it is only over the last two or three years when they have begun to be successfully implemented in commercial products. These algorithms can be implemented in any element of the web architecture and can use a wide variety of information as input. This affects their structure, data system, computational resources and accuracy. The knowledge of the input information and the understanding of how it can be handled to make predictions can help to improve the design of current prediction engines, and consequently prefetching techniques. This paper analyzes fifty of the most relevant algorithms proposed along 15 years of prefetching research and proposes a taxonomy where the algorithms are classified according to the input data they use. For each group, the main advantages and shortcomings are highlighted. © 2012 Elsevier Ltd. All rights reserved.This work has been partially supported by Spanish Ministry of Science and Innovation under Grant TIN2009-08201, Generalitat Valenciana under Grant GV/2011/002 and Universitat Politecnica de Valencia under Grant PAID-06-10/2424.Domenech, J.; De La Ossa Perez, BA.; Sahuquillo Borrás, J.; Gil Salinas, JA.; Pont Sanjuan, A. (2012). A taxonomy of web prediction algorithms. Expert Systems with Applications. 39(9):8496-8502. https://doi.org/10.1016/j.eswa.2012.01.140S8496850239

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution

    Just-In-Time Push Prefetching: Accelerating the Mobile Web

    Get PDF
    Web pages take noticeably longer to load when accessing the Internet using high-latency wide-area wireless networks like 3G. This delay can result in lower user satisfaction and lost revenue for web site operators. By locating a just-in-time prefetching push proxy in the Internet service provider's mobile network core and routing mobile client web requests through it, web page load times can be perceivably reduced. Our analysis and experimental results demonstrate that the use of a push proxy results in a much smaller dependency on the mobile-client-to-network latency than seen in environments where no proxy is used; in particular, only one full round trip from client to server is necessary regardless of the number of resources referenced by a web page. In addition, we find that the ideal location for a push proxy is close to the servers that the mobile client accesses, minimizing the latency between the proxy and the servers that the mobile client accesses through it; this is in contrast to traditional prefetching proxies that do not push prefetched items to the client, which are best deployed halfway between the client and the server

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution

    Adaptive Caching of Distributed Components

    Get PDF
    Die Zugriffslokalität referenzierter Daten ist eine wichtige Eigenschaft verteilter Anwendungen. Lokales Zwischenspeichern abgefragter entfernter Daten (Caching) wird vielfach bei der Entwicklung solcher Anwendungen eingesetzt, um diese Eigenschaft auszunutzen. Anschliessende Zugriffe auf diese Daten können so beschleunigt werden, indem sie aus dem lokalen Zwischenspeicher bedient werden. Gegenwärtige Middleware-Architekturen bieten dem Anwendungsprogrammierer jedoch kaum Unterstützung für diesen nicht-funktionalen Aspekt. Die vorliegende Arbeit versucht deshalb, Caching als separaten, konfigurierbaren Middleware-Dienst auszulagern. Durch die Einbindung in den Softwareentwicklungsprozess wird die frühzeitige Modellierung und spätere Wiederverwendung caching-spezifischer Metadaten gewährleistet. Zur Laufzeit kann sich das entwickelte System außerdem bezüglich der Cachebarkeit von Daten adaptiv an geändertes Nutzungsverhalten anpassen.Locality of reference is an important property of distributed applications. Caching is typically employed during the development of such applications to exploit this property by locally storing queried data: Subsequent accesses can be accelerated by serving their results immediately form the local store. Current middleware architectures however hardly support this non-functional aspect. The thesis at hand thus tries outsource caching as a separate, configurable middleware service. Integration into the software development lifecycle provides for early capturing, modeling, and later reuse of cachingrelated metadata. At runtime, the implemented system can adapt to caching access characteristics with respect to data cacheability properties, thus healing misconfigurations and optimizing itself to an appropriate configuration. Speculative prefetching of data probably queried in the immediate future complements the presented approach

    Evaluating techniques for parallelization tuning in MPI, OmpSs and MPI/OmpSs

    Get PDF
    Parallel programming is used to partition a computational problem among multiple processing units and to define how they interact (communicate and synchronize) in order to guarantee the correct result. The performance that is achieved when executing the parallel program on a parallel architecture is usually far from the optimal: computation unbalance and excessive interaction among processing units often cause lost cycles, reducing the efficiency of parallel computation. In this thesis we propose techniques oriented to better exploit parallelism in parallel applications, with emphasis in techniques that increase asynchronism. Theoretically, this type of parallelization tuning promises multiple benefits. First, it should mitigate communication and synchronization delays, thus increasing the overall performance. Furthermore, parallelization tuning should expose additional parallelism and therefore increase the scalability of execution. Finally, increased asynchronism would provide higher tolerance to slower networks and external noise. In the first part of this thesis, we study the potential for tuning MPI parallelism. More specifically, we explore automatic techniques to overlap communication and computation. We propose a speculative messaging technique that increases the overlap and requires no changes of the original MPI application. Our technique automatically identifies the application’s MPI activity and reinterprets that activity using optimally placed non-blocking MPI requests. We demonstrate that this overlapping technique increases the asynchronism of MPI messages, maximizing the overlap, and consequently leading to execution speedup and higher tolerance to bandwidth reduction. However, in the case of realistic scientific workloads, we show that the overlapping potential is significantly limited by the pattern by which each MPI process locally operates on MPI messages. In the second part of this thesis, we study the potential for tuning hybrid MPI/OmpSs parallelism. We try to gain a better understanding of the parallelism of hybrid MPI/OmpSs applications in order to evaluate how these applications would execute on future machines and to predict the execution bottlenecks that are likely to emerge. We explore how MPI/OmpSs applications could scale on the parallel machine with hundreds of cores per node. Furthermore, we investigate how this high parallelism within each node would reflect on the network constraints. We especially focus on identifying critical code sections in MPI/OmpSs. We devised a technique that quickly evaluates, for a given MPI/OmpSs application and the selected target machine, which code section should be optimized in order to gain the highest performance benefits. Also, this thesis studies techniques to quickly explore the potential OmpSs parallelism inherent in applications. We provide mechanisms to easily evaluate potential parallelism of any task decomposition. Furthermore, we describe an iterative trialand-error approach to search for a task decomposition that will expose sufficient parallelism for a given target machine. Finally, we explore potential of automating the iterative approach by capturing the programmers’ experience into an expert system that can autonomously lead the search process. Also, throughout the work on this thesis, we designed development tools that can be useful to other researchers in the field. The most advanced of these tools is Tareador – a tool to help porting MPI applications to MPI/OmpSs programming model. Tareador provides a simple interface to propose some decomposition of a code into OmpSs tasks. Tareador dynamically calculates data dependencies among the annotated tasks, and automatically estimates the potential OmpSs parallelization. Furthermore, Tareador gives additional hints on how to complete the process of porting the application to OmpSs. Tareador already proved itself useful, by being included in the academic classes on parallel programming at UPC.La programación paralela consiste en dividir un problema de computación entre múltiples unidades de procesamiento y definir como interactúan (comunicación y sincronización) para garantizar un resultado correcto. El rendimiento de un programa paralelo normalmente está muy lejos de ser óptimo: el desequilibrio de la carga computacional y la excesiva interacción entre las unidades de procesamiento a menudo causa ciclos perdidos, reduciendo la eficiencia de la computación paralela. En esta tesis proponemos técnicas orientadas a explotar mejor el paralelismo en aplicaciones paralelas, poniendo énfasis en técnicas que incrementan el asincronismo. En teoría, estas técnicas prometen múltiples beneficios. Primero, tendrían que mitigar el retraso de la comunicación y la sincronización, y por lo tanto incrementar el rendimiento global. Además, la calibración de la paralelización tendría que exponer un paralelismo adicional, incrementando la escalabilidad de la ejecución. Finalmente, un incremente en el asincronismo proveería una tolerancia mayor a redes de comunicación lentas y ruido externo. En la primera parte de la tesis, estudiamos el potencial para la calibración del paralelismo a través de MPI. En concreto, exploramos técnicas automáticas para solapar la comunicación con la computación. Proponemos una técnica de mensajería especulativa que incrementa el solapamiento y no requiere cambios en la aplicación MPI original. Nuestra técnica identifica automáticamente la actividad MPI de la aplicación y la reinterpreta usando solicitudes MPI no bloqueantes situadas óptimamente. Demostramos que esta técnica maximiza el solapamiento y, en consecuencia, acelera la ejecución y permite una mayor tolerancia a las reducciones de ancho de banda. Aún así, en el caso de cargas de trabajo científico realistas, mostramos que el potencial de solapamiento está significativamente limitado por el patrón según el cual cada proceso MPI opera localmente en el paso de mensajes. En la segunda parte de esta tesis, exploramos el potencial para calibrar el paralelismo híbrido MPI/OmpSs. Intentamos obtener una comprensión mejor del paralelismo de aplicaciones híbridas MPI/OmpSs para evaluar de qué manera se ejecutarían en futuras máquinas. Exploramos como las aplicaciones MPI/OmpSs pueden escalar en una máquina paralela con centenares de núcleos por nodo. Además, investigamos cómo este paralelismo de cada nodo se reflejaría en las restricciones de la red de comunicación. En especia, nos concentramos en identificar secciones críticas de código en MPI/OmpSs. Hemos concebido una técnica que rápidamente evalúa, para una aplicación MPI/OmpSs dada y la máquina objetivo seleccionada, qué sección de código tendría que ser optimizada para obtener la mayor ganancia de rendimiento. También estudiamos técnicas para explorar rápidamente el paralelismo potencial de OmpSs inherente en las aplicaciones. Proporcionamos mecanismos para evaluar fácilmente el paralelismo potencial de cualquier descomposición en tareas. Además, describimos una aproximación iterativa para buscar una descomposición en tareas que mostrará el suficiente paralelismo en la máquina objetivo dada. Para finalizar, exploramos el potencial para automatizar la aproximación iterativa. En el trabajo expuesto en esta tesis hemos diseñado herramientas que pueden ser útiles para otros investigadores de este campo. La más avanzada es Tareador, una herramienta para ayudar a migrar aplicaciones al modelo de programación MPI/OmpSs. Tareador proporciona una interfaz simple para proponer una descomposición del código en tareas OmpSs. Tareador también calcula dinámicamente las dependencias de datos entre las tareas anotadas, y automáticamente estima el potencial de paralelización OmpSs. Por último, Tareador da indicaciones adicionales sobre como completar el proceso de migración a OmpSs. Tareador ya se ha mostrado útil al ser incluido en las clases de programación de la UPC

    Functional programming abstractions for weakly consistent systems

    Get PDF
    In recent years, there has been a wide-spread adoption of both multicore and cloud computing. Traditionally, concurrent programmers have relied on the underlying system providing strong memory consistency, where there is a semblance of concurrent tasks operating over a shared global address space. However, providing scalable strong consistency guarantees as the scale of the system grows is an increasingly difficult endeavor. In a multicore setting, the increasing complexity and the lack of scalability of hardware mechanisms such as cache coherence deters scalable strong consistency. In geo-distributed compute clouds, the availability concerns in the presence of partial failures prohibit strong consistency. Hence, modern multicore and cloud computing platforms eschew strong consistency in favor of weakly consistent memory, where each task\u27s memory view is incomparable with the other tasks. As a result, programmers on these platforms must tackle the full complexity of concurrent programming for an asynchronous distributed system. ^ This dissertation argues that functional programming language abstractions can simplify scalable concurrent programming for weakly consistent systems. Functional programming espouses mutation-free programming, and rare mutations when present are explicit in their types. By controlling and explicitly reasoning about shared state mutations, functional abstractions simplify concurrent programming. Building upon this intuition, this dissertation presents three major contributions, each focused on addressing a particular challenge associated with weakly consistent loosely coupled systems. First, it describes A NERIS, a concurrent functional programming language and runtime for the Intel Single-chip Cloud Computer, and shows how to provide an efficient cache coherent virtual address space on top of a non cache coherent multicore architecture. Next, it describes RxCML, a distributed extension of MULTIMLTON and shows that, with the help of speculative execution, synchronous communication can be utilized as an efficient abstraction for programming asynchronous distributed systems. Finally, it presents QUELEA, a programming system for eventually consistent distributed stores, and shows that the choice of correct consistency level for replicated data type operations and transactions can be automated with the help of high-level declarative contracts

    Control plane optimization in Software Defined Networking and task allocation for Fog Computing

    Get PDF
    As the next generation of mobile wireless standard, the fifth generation (5G) of cellular/wireless network has drawn worldwide attention during the past few years. Due to its promise of higher performance over the legacy 4G network, an increasing number of IT companies and institutes have started to form partnerships and create 5G products. Emerging techniques such as Software Defined Networking and Mobile Edge Computing are also envisioned as key enabling technologies to augment 5G competence. However, as popular and promising as it is, 5G technology still faces several intrinsic challenges such as (i) the strict requirements in terms of end-to-end delays, (ii) the required reliability in the control plane and (iii) the minimization of the energy consumption. To cope with these daunting issues, we provide the following main contributions. As first contribution, we address the problem of the optimal placement of SDN controllers. Specifically, we give a detailed analysis of the impact that controller placement imposes on the reactivity of SDN control plane, due to the consistency protocols adopted to manage the data structures that are shared across different controllers. We compute the Pareto frontier, showing all the possible tradeoffs achievable between the inter-controller delays and the switch-to-controller latencies. We define two data-ownership models and formulate the controller placement problem with the goal of minimizing the reaction time of control plane, as perceived by a switch. We propose two evolutionary algorithms, namely Evo-Place and Best-Reactivity, to compute the Pareto frontier and the controller placement minimizing the reaction time, respectively. Experimental results show that Evo-Place outperforms its random counterpart, and Best-Reactivity can achieve a relative error of <= 30% with respect to the optimal algorithm by only sampling less than 10% of the whole solution space. As second contribution, we propose a stateful SDN approach to improve the scalability of traffic classification in SDN networks. In particular, we leverage the OpenState extension to OpenFlow to deploy state machines inside the switch and minimize the number of packets redirected to the traffic classifier. We experimentally compare two approaches, namely Simple Count-Down (SCD) and Compact Count-Down (CCD), to scale the traffic classifier and minimize the flow table occupancy. As third contribution, we propose an approach to improve the reliability of SDN controllers. We implement BeCheck, which is a software framework to detect ``misbehaving'' controllers. BeCheck resides transparently between the control plane and data plane, and monitors the exchanged OpenFlow traffic messages. We implement three policies to detect misbehaving controllers and forward the intercepted messages. BeCheck along with the different policies are validated in a real test-bed. As fourth contribution, we investigate a mobile gaming scenario in the context of fog computing, denoted as Integrated Mobile Gaming (IMG) scenario. We partition mobile games into individual tasks and cognitively offload them either to the cloud or the neighbor mobile devices, so as to achieve minimal energy consumption. We formulate the IMG model as an ILP problem and propose a heuristic named Task Allocation with Minimal Energy cost (TAME). Experimental results show that TAME approaches the optimal solutions while outperforming two other state-of-the-art task offloading algorithms
    • …
    corecore