12 research outputs found

    On Evaluating Commercial Cloud Services: A Systematic Review

    Full text link
    Background: Cloud Computing is increasingly booming in industry with many competing providers and services. Accordingly, evaluation of commercial Cloud services is necessary. However, the existing evaluation studies are relatively chaotic. There exists tremendous confusion and gap between practices and theory about Cloud services evaluation. Aim: To facilitate relieving the aforementioned chaos, this work aims to synthesize the existing evaluation implementations to outline the state-of-the-practice and also identify research opportunities in Cloud services evaluation. Method: Based on a conceptual evaluation model comprising six steps, the Systematic Literature Review (SLR) method was employed to collect relevant evidence to investigate the Cloud services evaluation step by step. Results: This SLR identified 82 relevant evaluation studies. The overall data collected from these studies essentially represent the current practical landscape of implementing Cloud services evaluation, and in turn can be reused to facilitate future evaluation work. Conclusions: Evaluation of commercial Cloud services has become a world-wide research topic. Some of the findings of this SLR identify several research gaps in the area of Cloud services evaluation (e.g., the Elasticity and Security evaluation of commercial Cloud services could be a long-term challenge), while some other findings suggest the trend of applying commercial Cloud services (e.g., compared with PaaS, IaaS seems more suitable for customers and is particularly important in industry). This SLR study itself also confirms some previous experiences and reveals new Evidence-Based Software Engineering (EBSE) lessons

    Heuristics for offset assignment in embedded processors

    Get PDF
    This thesis deals with the optimization of program size and performance in current generation embedded digital signal processors (DSPs) by the design of optimal memory layouts for data. Given the tight constraints on the size, power consumption, cost and performance of these processors, the minimization of code size in terms of the number of instructions required and the associated reduction in execution time are important. Several DSPs provide limited addressing modes and the layout of data, known as offset assignment, plays a critical role in determining the code size and performance. Even the simplest variant of the offset assignment problem is NP-complete. Research effort in this area has focused on the design, implementation and evaluation of effective heuristics for several variants of the offset assignment problem. One of the most important factors in the determination of the size, and hence, execution time of a code is the number of instructions required to access the variables stored in the processor memory. The indirect addressing mode common in DSPs requires memory accesses to be realized through address registers that hold the address of the memory location to be accessed. The architecture provides instructions for adding to and subtracting from the values of the address registers to compute the addresses of subsequent data that need to be accessed. In addition, some DSP processors include multiple memory banks that allow increased parallelism in memory access. Proper partitioning of variables across memory banks is critical to effectively using the increased parallelism. The work reported in this thesis aims to evolve efficient methods for designing memory layouts under the conditions of availability of one address register (SOA) or of multiple address registers (GOA). It also proposes a novel technique for choosing the assignment of variables to the memory banks. This thesis motivates, proposes and evaluates heuristics for all these three problems. For the SOA and GOA problems, the heuristics are implemented and tested on different random sample inputs, and the results obtained are compared to those obtained by prior heuristics. In addition, this thesis provides some insight into the SOA, GOA and the variable partitioning problems

    Concurrent execution of transactional memory applications on heterogeneous processors

    Get PDF
    En la actualidad, para encontrar un buen equilibrio entre rendimiento y consumo energético, los fabricantes están empezando a ofrecer procesadores heterogéneos. Estos presentan 2 tipos diferentes de núcleos: unos están enfocados a la eficiencia energética y otros al rendimiento. Un ejemplo de estos procesadores son los big.LITTLE de ARM. Para obtener un buen provecho de este tipo de procesadores, los programadores deben escribir programas multi-hilo. En este tipo de programas, los mecanismos de exclusión mútua son los encargados de garantizar que, en un instante dado, únicamente uno de los hilos acceda a los datos compartidos para asegurar la consistencia de dichos datos. Aunque de forma tradicional estos problemas se han resuelto con cerrojos, la memoria transaccional (TM por sus siglas en inglés) está cobrando importancia. En un trabajo previo se han caracterizado un conjunto de aplicaciones que utilizan una popular librería de TM por software, midiendo rendimiento y consumo de energía en ambos tipos de núcleos. Una vez caracterizadas las aplicaciones individualmente, es interesante el diseño de un sistema de planificación automático que pueda asignar las aplicaciones a aquellos núcleos en los que se estima un mejor rendimiento o menor consumo energético. En este trabajo presentamos ScHeTM, un planificador de aplicaciones que utilizan TM sobre procesadores multi-núcleo heterogéneos. Atendiendo a las características de las aplicaciones en cuanto a tiempo de cómputo, consumo de energía, y el número de conflictos en la zona de exclusión mútua, ScHeTM realiza una planificación adecuada con el objetivo de mejorar el rendimiento del sistema

    Energy consumption analysis of Software Transactional Memory on low power processors

    Get PDF
    Tradicionalmente, en programas multi-hilo, los mecanismos de exclusión mútua se implementan mediante el uso de cerrojos, que garantizan que únicamente uno de los hilos accede a la sección de código en la que se manipulan dichos datos. La Memoria Transaccional (TM) es una alternativa a los cerrojos enfocada a obtener un mejor rendimiento y proporcionar mayor facilidad de programación. TM puede implementarse por software o hardware, siendo las alternativas software más convenientes en términos de flexibilidad y portabilidad. Trabajos recientes han analizado y propuesto soluciones de TM en las que el consumo energético es un factor a tener en cuenta. Buena parte de estos trabajos se realizan sobre simuladores de hardware o sobre procesadores orientados a la computación de altas prestaciones; los estudios sobre hardware físico orientado al bajo consumo no han sido explorados aún. Encontrar soluciones TM software energéticamente eficientes en procesadores actuales de bajo consumo, como pueden ser los incorporados en dispositivos móviles y empotrados, es un campo de investigación abierto. Este proyecto realiza el análisis energético de una librería TM software existente en el mercado sobre un dispositivo de bajo consumo basado en procesadores ARM. El principal objetivo es proporcionar métricas de rendimiento y energía sobre el comportamiento energético de dicha librería en el procesador mencionado. Un objetivo adicional es la instrumentación de benchmarks de prueba, lo cual proporciona una herramienta indispensable para realizar futuras investigaciones en el área

    Memory region: a system abstraction for managing the complex memory structures of multicore platforms

    Get PDF
    The performance of modern many-core systems depends on the effective use of their complex cache and memory structures, and this will likely become more pronounced with the impending arrival of on-chip 3D stacked and non-volatile off-chip byte-addressable memory. Yet to date, operating systems have not treated memory as a first class schedulable resource, embracing memory heterogeneity. This dissertation presents a new software abstraction, called ‘memory region’, which denotes the current set of physical memory pages actively used by workloads. Using this abstraction, memory resources can be scheduled for applications to fully exploit a platform's underlying cache and memory system, thereby gaining improved performance and predictability in execution, particularly for the consolidated workloads seen in virtualized and cloud computing infrastructures. The abstraction's implementation in the Xen hypervisor involves the run-time detection of memory regions, the scheduled mapping of these regions to caches to match performance goals, and maintaining region-to-cache mappings using per-cache page tables. This dissertation makes the following specific contributions. First, its region scheduling method proposes that the location of memory blocks rather than CPU utilization is the principal determinant where workloads are run. It proposes a new scheduling method, the region scheduling that the location of memory blocks determines where the workloads are run. Second, treating memory blocks as first-class resources, new methods for efficient cache management are shown to improve application performance as well as the performance of certain operating system functions. Third, explicit memory scheduling makes it possible to disaggregate operating systems, without the need to change OS sources and with only small markups of target guest OS functionality. With this method, OS functions can be mapped to specific desired platform components, such as file system confined to running on specific cores and using only certain memory resources designated for its use. This can improve performance for applications heavily dependent on certain OS functions, by dynamically providing those functions with the resources needed for their current use, and it can prevent performance-critical application functionality from being needlessly perturbed by OS functions used for other purposes or by other jobs. Fourth, extensions of region scheduling can also help applications deal with the heterogeneous memory resources present in future systems, including on-chip stacked DRAM and NUMA or even NVRAM memory modules. More generally, regions scheduling is shown to apply to memory structures with well-defined differences in memory access latencies.Ph.D

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra

    Get PDF
    Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries

    Social Sector in a Decentralized Economy

    Get PDF
    This book is an analytical examination of financing and public service delivery challenges in a decentralized framework. In addition, it provides critical insights into the effectiveness of public expenditure, through benefit incidence analysis of education and healthcare services in India

    Advances in Information Security and Privacy

    Get PDF
    With the recent pandemic emergency, many people are spending their days in smart working and have increased their use of digital resources for both work and entertainment. The result is that the amount of digital information handled online is dramatically increased, and we can observe a significant increase in the number of attacks, breaches, and hacks. This Special Issue aims to establish the state of the art in protecting information by mitigating information risks. This objective is reached by presenting both surveys on specific topics and original approaches and solutions to specific problems. In total, 16 papers have been published in this Special Issue

    Multi-tasking scheduling for heterogeneous systems

    Get PDF
    Heterogeneous platforms play an increasingly important role in modern computer systems. They combine high performance with low power consumption. From mobiles to supercomputers, we see an increasing number of computer systems that are heterogeneous. The most well-known heterogeneous system, CPU+GPU platforms have been widely used in recent years. As they become more mainstream, serving multiple tasks from multiple users is an emerging challenge. A good scheduler can greatly improve performance. However, indiscriminately allocating tasks based on availability leads to poor performance. As modern GPUs have a large number of hardware resources, most tasks cannot efficiently utilize all of them. Concurrent task execution on GPU is a promising solution, however, indiscriminately running tasks in parallel causes a slowdown. This thesis focuses on scheduling OpenCL kernels. A runtime framework is developed to determine where to schedule OpenCL kernels. It predicts the best-fit device by using a machine learning-based classifier, then schedules the kernels accordingly to either CPU or GPU. To improve GPU utilization, a kernel merging approach is proposed. Kernels are merged if their predicted co-execution can provide better performance than sequential execution. A machine learning based classifier is developed to find the best kernel pairs for co-execution on GPU. Finally, a runtime framework is developed to schedule kernels separately on either CPU or GPU, and run kernels in pairs if their co-execution can improve performance. The approaches developed in this thesis significantly improve system performance and outperform all existing techniques
    corecore