17 research outputs found

    Bid-Centric Cloud Service Provisioning

    Full text link
    Bid-centric service descriptions have the potential to offer a new cloud service provisioning model that promotes portability, diversity of choice and differentiation between providers. A bid matching model based on requirements and capabilities is presented that provides the basis for such an approach. In order to facilitate the bidding process, tenders should be specified as abstractly as possible so that the solution space is not needlessly restricted. To this end, we describe how partial TOSCA service descriptions allow for a range of diverse solutions to be proposed by multiple providers in response to tenders. Rather than adopting a lowest common denominator approach, true portability should allow for the relative strengths and differentiating features of cloud service providers to be applied to bids. With this in mind, we describe how TOSCA service descriptions could be augmented with additional information in order to facilitate heterogeneity in proposed solutions, such as the use of coprocessors and provider-specific services

    Portable parallel kernels for high-speed beamforming in synthetic aperture ultrasound imaging

    Get PDF
    In medical ultrasound, synthetic aperture (SA) imaging is well-considered as a novel image formation technique for achieving superior resolution than that offered by existing scanners. However, its intensive processing load is known to be a challenging factor. To address such a computational demand, this paper proposes a new parallel approach based on the design of OpenCL signal processing kernels that can compute SA image formation in real-time. We demonstrate how these kernels can be ported onto different classes of parallel processors, namely multi-core CPUs and GPUs, whose multi-thread computing resources are able to process more than 250 fps. Moreover, they have strong potential to support the development of more complex algorithms, thus increasing the depth range of the inspected human volume and the final image resolution observed by the medical practitioner.published_or_final_versio

    Implementación de un algoritmo para eliminación de ruido impulsivo en imágenes y análisis comparativo de tiempos de respuesta bajo arquitectura GPU y CPU

    Get PDF
    El presente trabajo, tuvo como propósito general determinar en procesamiento digital de imágenes, tiempos de respuesta al implementar un algoritmo en diferentes arquitecturas (CPU Y GPU), utilizando interpolación a través de funciones de base radial. Para cumplir con este objetivo, se parte de una investigación previa sobre eliminación de ruido impulsivo en imágenes, a partir de allí se plantea en base a una solución en pseudocódigo un algoritmo apropiado para la arquitectura CPU y arquitectura GPU. Sobre la arquitectura GPU se detallan las particularidades identificadas al momento de la implementación (utilizando tecnología CUDA); restricciones sobre de la plataforma y alternativas de implementación. Consecuente a la implementación, se plantea un conjunto de pruebas con imágenes, las cuales tienen ruido del tipo sal y pimienta y de diferentes dimensiones (ancho, alto), estas pruebas buscan determinar los tiempos de respuesta en cuanto a eliminación de ruido por parte del algoritmo implementado en las dos arquitecturas. Las pruebas en tiempos de respuesta generan resultados que son analizados, principalmente evidenciando una correcta eliminación de los pixeles ruidosos (que alcanzan los 55 mil en una sola imagen) en el caso de las dos arquitecturas, y adicionalmente el tiempo de respuesta claramente bajo (mayor rapidez en procesamiento) en la arquitectura CPU con respecto a la arquitectura GPU.The research had as a primary objective determine in the field of digital processing images, response times in different architectures (CPU and GPU), using interpolation trough the basis radial functions. To achieve this objective, it start with a previous research about impulsive noise elimination in images, from there its proposed an appropiate algorithm based in a pseudocode solution, to the CPU architecture and GPU architecture. For the GPU architecture is detailed several particularities identified at the implementation phase (using CUDA technology); platform restrictions and implementation work-arounds. As a result of the implementation phase, its proposed a test images, which have noise salt and pepper with different dimensions (width, height), these tests seek to determine response times about noise elimination by the algorithm implemented on the two architectures. In testing the results were analyzed, mainly showing a correct noise elimination in images (reaching up to 55 thousand noisy pixels in a image) at both CPU and GPU architectures, additionally a clearly lower response time (faster in processing) on the CPU Architecture regarding to the GPU Architecture

    MURAC: A unified machine model for heterogeneous computers

    Get PDF
    Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination

    Online- und Offline-Prozessierung von biologischen Zellbildern auf FPGAs und GPUs

    Get PDF
    Wenn Bilder von einem Mikroskop mit hohem Datendurchsatz aufgenommen werden, müssen sie wegen der großen Bildmenge in einer automatischen Analyse prozessiert werden. Es gibt zwei Ansätze: die Offlineprozessierung, die Verarbeitung der Bilder auf einem Cluster, und die Onlineprozessierung, die Verarbeitung des Pixelstroms direkt von den Sensoren. Für die Bewältigung der Bilddaten in der Offlineprozessierung setzt diese Arbeit auf Grafikkarten und demonstriert eine Implementierung der Haralick-Bildmerkmalerkennung in CUDA. Dabei wird der Algorithmus um den Faktor 1000, gegenüber einer CPU-Lösung, beschleunigt. Dies ermöglicht den Biologen weitere Tests und einen schnelleren Erkenntnisgewinn. Die Onlineprozessierung setzt auf FPGAs, die sich mit den Sensoren elektrisch verbinden lassen. Dabei soll sich der Algorithmus dem Bedarf der Biologen entsprechend verändern lassen. Diese Arbeit zeigt die Entwicklung eines OpenCL-FPGA-Kompilierer-Prototyps. Die Biologen können Algorithmen in OpenCL schreiben und in ein Hardwaredesign für den FPGA übersetzen, was in einer Hardwarebeschreibungssprache für sie zu komplex wäre. Neben der Einfachheit hat die parallele Sprache OpenCL den Vorteil der Portierbarkeit auf andere Architekturen. Falls der FPGA-Kompilierer wegen existierender Einschränkungen den Algorithmus nicht übersetzen kann, lässt sich das OpenCL-Programm auch für die GPUs in der Offlineprozessierung übersetzen

    Functional programming languages in computing clouds: practical and theoretical explorations

    Get PDF
    Cloud platforms must integrate three pillars: messaging, coordination of workers and data. This research investigates whether functional programming languages have any special merit when it comes to the implementation of cloud computing platforms. This thesis presents the lightweight message queue CMQ and the DSL CWMWL for the coordination of workers that we use as artefact to proof or disproof the special merit of functional programming languages in computing clouds. We have detailed the design and implementation with the broad aim to match the notions and the requirements of computing clouds. Our approach to evaluate these aims is based on evaluation criteria that are based on a series of comprehensive rationales and specifics that allow the FPL Haskell to be thoroughly analysed. We find that Haskell is excellent for use cases that do not require the distribution of the application across the boundaries of (physical or virtual) systems, but not appropriate as a whole for the development of distributed cloud based workloads that require communication with the far side and coordination of decoupled workloads. However, Haskell may be able to qualify as a suitable vehicle in the future with future developments of formal mechanisms that embrace non-determinism in the underlying distributed environments leading to applications that are anti-fragile rather than applications that insist on strict determinism that can only be guaranteed on the local system or via slow blocking communication mechanisms

    Functional programming languages in computing clouds: practical and theoretical explorations

    Get PDF
    Cloud platforms must integrate three pillars: messaging, coordination of workers and data. This research investigates whether functional programming languages have any special merit when it comes to the implementation of cloud computing platforms. This thesis presents the lightweight message queue CMQ and the DSL CWMWL for the coordination of workers that we use as artefact to proof or disproof the special merit of functional programming languages in computing clouds. We have detailed the design and implementation with the broad aim to match the notions and the requirements of computing clouds. Our approach to evaluate these aims is based on evaluation criteria that are based on a series of comprehensive rationales and specifics that allow the FPL Haskell to be thoroughly analysed. We find that Haskell is excellent for use cases that do not require the distribution of the application across the boundaries of (physical or virtual) systems, but not appropriate as a whole for the development of distributed cloud based workloads that require communication with the far side and coordination of decoupled workloads. However, Haskell may be able to qualify as a suitable vehicle in the future with future developments of formal mechanisms that embrace non-determinism in the underlying distributed environments leading to applications that are anti-fragile rather than applications that insist on strict determinism that can only be guaranteed on the local system or via slow blocking communication mechanisms
    corecore