12 research outputs found
On Evaluating Commercial Cloud Services: A Systematic Review
Background: Cloud Computing is increasingly booming in industry with many
competing providers and services. Accordingly, evaluation of commercial Cloud
services is necessary. However, the existing evaluation studies are relatively
chaotic. There exists tremendous confusion and gap between practices and theory
about Cloud services evaluation. Aim: To facilitate relieving the
aforementioned chaos, this work aims to synthesize the existing evaluation
implementations to outline the state-of-the-practice and also identify research
opportunities in Cloud services evaluation. Method: Based on a conceptual
evaluation model comprising six steps, the Systematic Literature Review (SLR)
method was employed to collect relevant evidence to investigate the Cloud
services evaluation step by step. Results: This SLR identified 82 relevant
evaluation studies. The overall data collected from these studies essentially
represent the current practical landscape of implementing Cloud services
evaluation, and in turn can be reused to facilitate future evaluation work.
Conclusions: Evaluation of commercial Cloud services has become a world-wide
research topic. Some of the findings of this SLR identify several research gaps
in the area of Cloud services evaluation (e.g., the Elasticity and Security
evaluation of commercial Cloud services could be a long-term challenge), while
some other findings suggest the trend of applying commercial Cloud services
(e.g., compared with PaaS, IaaS seems more suitable for customers and is
particularly important in industry). This SLR study itself also confirms some
previous experiences and reveals new Evidence-Based Software Engineering (EBSE)
lessons
Heuristics for offset assignment in embedded processors
This thesis deals with the optimization of program size and performance in current generation embedded digital signal processors (DSPs) by the design of optimal memory layouts for data. Given the tight constraints on the size, power consumption, cost and performance of these processors, the minimization of code size in terms of the number of instructions required and the associated reduction in execution time are important. Several DSPs provide limited addressing modes and the layout of data, known as offset assignment, plays a critical role in determining the code size and performance. Even the simplest variant of the offset assignment problem is NP-complete. Research effort in this area has focused on the design, implementation and evaluation of effective heuristics for several variants of the offset assignment problem. One of the most important factors in the determination of the size, and hence, execution time of a code is the number of instructions required to access the variables stored in the processor memory. The indirect addressing mode common in DSPs requires memory accesses to be realized through address registers that hold the address of the memory location to be accessed. The architecture provides instructions for adding to and subtracting from the values of the address registers to compute the addresses of subsequent data that need to be accessed. In addition, some DSP processors include multiple memory banks that allow increased parallelism in memory access. Proper partitioning of variables across memory banks is critical to effectively using the increased parallelism. The work reported in this thesis aims to evolve efficient methods for designing memory layouts under the conditions of availability of one address register (SOA) or of multiple address registers (GOA). It also proposes a novel technique for choosing the assignment of variables to the memory banks. This thesis motivates, proposes and evaluates heuristics for all these three problems. For the SOA and GOA problems, the heuristics are implemented and tested on different random sample inputs, and the results obtained are compared to those obtained by prior heuristics. In addition, this thesis provides some insight into the SOA, GOA and the variable partitioning problems
Concurrent execution of transactional memory applications on heterogeneous processors
En la actualidad, para encontrar un buen equilibrio entre rendimiento y consumo
energético, los fabricantes están empezando a ofrecer procesadores heterogéneos. Estos
presentan 2 tipos diferentes de núcleos: unos están enfocados a la eficiencia energética y otros
al rendimiento. Un ejemplo de estos procesadores son los big.LITTLE de ARM. Para obtener
un buen provecho de este tipo de procesadores, los programadores deben escribir programas
multi-hilo. En este tipo de programas, los mecanismos de exclusión mútua son los encargados
de garantizar que, en un instante dado, únicamente uno de los hilos acceda a los datos
compartidos para asegurar la consistencia de dichos datos. Aunque de forma tradicional estos
problemas se han resuelto con cerrojos, la memoria transaccional (TM por sus siglas en
inglés) está cobrando importancia. En un trabajo previo se han caracterizado un conjunto de
aplicaciones que utilizan una popular librerÃa de TM por software, midiendo rendimiento y
consumo de energÃa en ambos tipos de núcleos. Una vez caracterizadas las aplicaciones
individualmente, es interesante el diseño de un sistema de planificación automático que pueda
asignar las aplicaciones a aquellos núcleos en los que se estima un mejor rendimiento o
menor consumo energético.
En este trabajo presentamos ScHeTM, un planificador de aplicaciones que utilizan TM sobre
procesadores multi-núcleo heterogéneos. Atendiendo a las caracterÃsticas de las aplicaciones
en cuanto a tiempo de cómputo, consumo de energÃa, y el número de conflictos en la zona de
exclusión mútua, ScHeTM realiza una planificación adecuada con el objetivo de mejorar el
rendimiento del sistema
Energy consumption analysis of Software Transactional Memory on low power processors
Tradicionalmente, en programas multi-hilo, los mecanismos de exclusión mútua se implementan mediante el uso de cerrojos, que garantizan
que únicamente uno de los hilos accede a la sección de código en la que se manipulan dichos datos. La Memoria Transaccional (TM) es una alternativa a los cerrojos enfocada a obtener un mejor rendimiento y proporcionar mayor
facilidad de programación. TM puede implementarse por software o hardware, siendo las alternativas software más convenientes en términos de flexibilidad y portabilidad. Trabajos recientes han analizado y propuesto soluciones de TM en las que el consumo energético es un factor a tener en
cuenta. Buena parte de estos trabajos se realizan sobre simuladores de hardware o sobre procesadores orientados a la computación de altas
prestaciones; los estudios sobre hardware fÃsico orientado al bajo consumo no han sido explorados aún. Encontrar soluciones TM software energéticamente eficientes en procesadores actuales de bajo consumo, como pueden ser los
incorporados en dispositivos móviles y empotrados, es un campo de investigación abierto.
Este proyecto realiza el análisis energético de una librerÃa TM software existente en el mercado sobre un dispositivo de bajo consumo basado en
procesadores ARM. El principal objetivo es proporcionar métricas de rendimiento y energÃa sobre el comportamiento energético de dicha librerÃa en el procesador mencionado. Un objetivo adicional es la instrumentación de benchmarks de prueba, lo cual proporciona una herramienta indispensable para realizar futuras investigaciones en el área
Memory region: a system abstraction for managing the complex memory structures of multicore platforms
The performance of modern many-core systems depends on the effective use of their complex cache and memory structures, and this will likely become more pronounced with the impending arrival of on-chip 3D stacked and non-volatile off-chip byte-addressable memory. Yet to date, operating systems have not treated memory as a first class schedulable resource, embracing memory heterogeneity. This dissertation presents a new software abstraction, called ‘memory region’, which denotes the current set of physical memory pages actively used by workloads. Using this abstraction, memory resources can be scheduled for applications to fully exploit a platform's underlying cache and memory system, thereby gaining improved performance and predictability in execution, particularly for the consolidated workloads seen in virtualized and cloud computing infrastructures. The abstraction's implementation in the Xen hypervisor involves the run-time detection of memory regions, the scheduled mapping of these regions to caches to match performance goals, and maintaining region-to-cache mappings using per-cache page tables. This dissertation makes the following specific contributions. First, its region scheduling method proposes that the location of memory blocks rather than CPU utilization is the principal determinant where workloads are run. It proposes a new scheduling method, the region scheduling that the location of memory blocks determines where the workloads are run. Second, treating memory blocks as first-class resources, new methods for efficient cache management are shown to improve application performance as well as the performance of certain operating system functions. Third, explicit memory scheduling makes it possible to disaggregate operating systems, without the need to change OS sources and with only small markups of target guest OS functionality. With this method, OS functions can be mapped to specific desired platform components, such as file system confined to running on specific cores and using only certain memory resources designated for its use. This can improve performance for applications heavily dependent on certain OS functions, by dynamically providing those functions with the resources needed for their current use, and it can prevent performance-critical application functionality from being needlessly perturbed by OS functions used for other purposes or by other jobs. Fourth, extensions of region scheduling can also help applications deal with the heterogeneous memory resources present in future systems, including on-chip stacked DRAM and NUMA or even NVRAM memory modules. More generally, regions scheduling is shown to apply to memory structures with well-defined differences in memory access latencies.Ph.D
Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra
Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries
Social Sector in a Decentralized Economy
This book is an analytical examination of financing and public service delivery challenges in a decentralized framework. In addition, it provides critical insights into the effectiveness of public expenditure, through benefit incidence analysis of education and healthcare services in India
Advances in Information Security and Privacy
With the recent pandemic emergency, many people are spending their days in smart working and have increased their use of digital resources for both work and entertainment. The result is that the amount of digital information handled online is dramatically increased, and we can observe a significant increase in the number of attacks, breaches, and hacks. This Special Issue aims to establish the state of the art in protecting information by mitigating information risks. This objective is reached by presenting both surveys on specific topics and original approaches and solutions to specific problems. In total, 16 papers have been published in this Special Issue
Multi-tasking scheduling for heterogeneous systems
Heterogeneous platforms play an increasingly important role in modern computer
systems. They combine high performance with low power consumption. From mobiles
to supercomputers, we see an increasing number of computer systems that are
heterogeneous.
The most well-known heterogeneous system, CPU+GPU platforms have been widely
used in recent years. As they become more mainstream, serving multiple tasks from
multiple users is an emerging challenge. A good scheduler can greatly improve performance.
However, indiscriminately allocating tasks based on availability leads to poor
performance. As modern GPUs have a large number of hardware resources, most tasks
cannot efficiently utilize all of them. Concurrent task execution on GPU is a promising
solution, however, indiscriminately running tasks in parallel causes a slowdown.
This thesis focuses on scheduling OpenCL kernels. A runtime framework is developed
to determine where to schedule OpenCL kernels. It predicts the best-fit device by
using a machine learning-based classifier, then schedules the kernels accordingly to either
CPU or GPU. To improve GPU utilization, a kernel merging approach is proposed.
Kernels are merged if their predicted co-execution can provide better performance than
sequential execution. A machine learning based classifier is developed to find the best
kernel pairs for co-execution on GPU. Finally, a runtime framework is developed to
schedule kernels separately on either CPU or GPU, and run kernels in pairs if their
co-execution can improve performance. The approaches developed in this thesis significantly
improve system performance and outperform all existing techniques