446 research outputs found

    Improving Utility of GPU in Accelerating Industrial Applications with User-centred Automatic Code Translation

    Get PDF
    SMEs (Small and medium-sized enterprises), particularly those whose business is focused on developing innovative produces, are limited by a major bottleneck on the speed of computation in many applications. The recent developments in GPUs have been the marked increase in their versatility in many computational areas. But due to the lack of specialist GPU (Graphics processing units) programming skills, the explosion of GPU power has not been fully utilized in general SME applications by inexperienced users. Also, existing automatic CPU-to-GPU code translators are mainly designed for research purposes with poor user interface design and hard-to-use. Little attentions have been paid to the applicability, usability and learnability of these tools for normal users. In this paper, we present an online automated CPU-to-GPU source translation system, (GPSME) for inexperienced users to utilize GPU capability in accelerating general SME applications. This system designs and implements a directive programming model with new kernel generation scheme and memory management hierarchy to optimize its performance. A web-service based interface is designed for inexperienced users to easily and flexibly invoke the automatic resource translator. Our experiments with non-expert GPU users in 4 SMEs reflect that GPSME system can efficiently accelerate real-world applications with at least 4x and have a better applicability, usability and learnability than existing automatic CPU-to-GPU source translators

    ImageJ2: ImageJ for the next generation of scientific image data

    Full text link
    ImageJ is an image analysis program extensively used in the biological sciences and beyond. Due to its ease of use, recordable macro language, and extensible plug-in architecture, ImageJ enjoys contributions from non-programmers, amateur programmers, and professional developers alike. Enabling such a diversity of contributors has resulted in a large community that spans the biological and physical sciences. However, a rapidly growing user base, diverging plugin suites, and technical limitations have revealed a clear need for a concerted software engineering effort to support emerging imaging paradigms, to ensure the software's ability to handle the requirements of modern science. Due to these new and emerging challenges in scientific imaging, ImageJ is at a critical development crossroads. We present ImageJ2, a total redesign of ImageJ offering a host of new functionality. It separates concerns, fully decoupling the data model from the user interface. It emphasizes integration with external applications to maximize interoperability. Its robust new plugin framework allows everything from image formats, to scripting languages, to visualization to be extended by the community. The redesigned data model supports arbitrarily large, N-dimensional datasets, which are increasingly common in modern image acquisition. Despite the scope of these changes, backwards compatibility is maintained such that this new functionality can be seamlessly integrated with the classic ImageJ interface, allowing users and developers to migrate to these new methods at their own pace. ImageJ2 provides a framework engineered for flexibility, intended to support these requirements as well as accommodate future needs

    GSWO: A Programming Model for GPU-enabled Parallelization of Sliding Window Operations in Image Processing

    Get PDF
    Sliding Window Operations (SWOs) are widely used in image processing applications. They often have to be performed repeatedly across the target image, which can demand significant computing resources when processing large images with large windows. In applications in which real-time performance is essential, running these filters on a CPU often fails to deliver results within an acceptable timeframe. The emergence of sophisticated graphic processing units (GPUs) presents an opportunity to address this challenge. However, GPU programming requires a steep learning curve and is error-prone for novices, so the availability of a tool that can produce a GPU implementation automatically from the original CPU source code can provide an attractive means by which the GPU power can be harnessed effectively. This paper presents a GPUenabled programming model, called GSWO, which can assist GPU novices by converting their SWO-based image processing applications from the original C/C++ source code to CUDA code in a highly automated manner. This model includes a new set of simple SWO pragmas to generate GPU kernels and to support effective GPU memory management. We have implemented this programming model based on a CPU-to-GPU translator (C2GPU). Evaluations have been performed on a number of typical SWO image filters and applications. The experimental results show that the GSWO model is capable of efficiently accelerating these applications, with improved applicability and a speed-up of performance compared to several leading CPU-to- GPU source-to-source translators

    Supercomputing Frontiers

    Get PDF
    This open access book constitutes the refereed proceedings of the 7th Asian Conference Supercomputing Conference, SCFA 2022, which took place in Singapore in March 2022. The 8 full papers presented in this book were carefully reviewed and selected from 21 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling

    Serverless Computing Strategies on Cloud Platforms

    Full text link
    [ES] Con el desarrollo de la Computación en la Nube, la entrega de recursos virtualizados a través de Internet ha crecido enormemente en los últimos años. Las Funciones como servicio (FaaS), uno de los modelos de servicio más nuevos dentro de la Computación en la Nube, permite el desarrollo e implementación de aplicaciones basadas en eventos que cubren servicios administrados en Nubes públicas y locales. Los proveedores públicos de Computación en la Nube adoptan el modelo FaaS dentro de su catálogo para proporcionar computación basada en eventos altamente escalable para las aplicaciones. Por un lado, los desarrolladores especializados en esta tecnología se centran en crear marcos de código abierto serverless para evitar el bloqueo con los proveedores de la Nube pública. A pesar del desarrollo logrado por la informática serverless, actualmente hay campos relacionados con el procesamiento de datos y la optimización del rendimiento en la ejecución en los que no se ha explorado todo el potencial. En esta tesis doctoral se definen tres estrategias de computación serverless que permiten evidenciar los beneficios de esta tecnología para el procesamiento de datos. Las estrategias implementadas permiten el análisis de datos con la integración de dispositivos de aceleración para la ejecución eficiente de aplicaciones científicas en plataformas cloud públicas y locales. En primer lugar, se desarrolló la plataforma CloudTrail-Tracker. CloudTrail-Tracker es una plataforma serverless de código abierto basada en eventos para el procesamiento de datos que puede escalar automáticamente hacia arriba y hacia abajo, con la capacidad de escalar a cero para minimizar los costos operativos. Seguidamente, se plantea la integración de GPUs en una plataforma serverless local impulsada por eventos para el procesamiento de datos escalables. La plataforma admite la ejecución de aplicaciones como funciones severless en respuesta a la carga de un archivo en un sistema de almacenamiento de ficheros, lo que permite la ejecución en paralelo de las aplicaciones según los recursos disponibles. Este procesamiento es administrado por un cluster Kubernetes elástico que crece y decrece automáticamente según las necesidades de procesamiento. Ciertos enfoques basados en tecnologías de virtualización de GPU como rCUDA y NVIDIA-Docker se evalúan para acelerar el tiempo de ejecución de las funciones. Finalmente, se implementa otra solución basada en el modelo serverless para ejecutar la fase de inferencia de modelos de aprendizaje automático previamente entrenados, en la plataforma de Amazon Web Services y en una plataforma privada con el framework OSCAR. El sistema crece elásticamente de acuerdo con la demanda y presenta una escalado a cero para minimizar los costes. Por otra parte, el front-end proporciona al usuario una experiencia simplificada en la obtención de la predicción de modelos de aprendizaje automático. Para demostrar las funcionalidades y ventajas de las soluciones propuestas durante esta tesis se recogen varios casos de estudio que abarcan diferentes campos del conocimiento como la analítica de aprendizaje y la Inteligencia Artificial. Esto demuestra que la gama de aplicaciones donde la computación serverless puede aportar grandes beneficios es muy amplia. Los resultados obtenidos avalan el uso del modelo serverless en la simplificación del diseño de arquitecturas para el uso intensivo de datos en aplicaciones complejas.[CA] Amb el desenvolupament de la Computació en el Núvol, el lliurament de recursos virtualitzats a través d'Internet ha crescut granment en els últims anys. Les Funcions com a Servei (FaaS), un dels models de servei més nous dins de la Computació en el Núvol, permet el desenvolupament i implementació d'aplicacions basades en esdeveniments que cobreixen serveis administrats en Núvols públics i locals. Els proveïdors de computació en el Núvol públic adopten el model FaaS dins del seu catàleg per a proporcionar a les aplicacions computació altament escalable basada en esdeveniments. D'una banda, els desenvolupadors especialitzats en aquesta tecnologia se centren en crear marcs de codi obert serverless per a evitar el bloqueig amb els proveïdors del Núvol públic. Malgrat el desenvolupament alcançat per la informàtica serverless, actualment hi ha camps relacionats amb el processament de dades i l'optimització del rendiment d'execució en els quals no s'ha explorat tot el potencial. En aquesta tesi doctoral es defineixen tres estratègies informàtiques serverless que permeten demostrar els beneficis d'aquesta tecnologia per al processament de dades. Les estratègies implementades permeten l'anàlisi de dades amb a integració de dispositius accelerats per a l'execució eficient d'aplicacion scientífiques en plataformes de Núvol públiques i locals. En primer lloc, es va desenvolupar la plataforma CloudTrail-Tracker. CloudTrail-Tracker és una plataforma de codi obert basada en esdeveniments per al processament de dades serverless que pot escalar automáticament cap amunt i cap avall, amb la capacitat d'escalar a zero per a minimitzar els costos operatius. A continuació es planteja la integració de GPUs en una plataforma serverless local impulsada per esdeveniments per al processament de dades escalables. La plataforma admet l'execució d'aplicacions com funcions severless en resposta a la càrrega d'un arxiu en un sistema d'emmagatzemaments de fitxers, la qual cosa permet l'execució en paral·lel de les aplicacions segon sels recursos disponibles. Este processament és administrat per un cluster Kubernetes elàstic que creix i decreix automàticament segons les necessitats de processament. Certs enfocaments basats en tecnologies de virtualització de GPU com rCUDA i NVIDIA-Docker s'avaluen per a accelerar el temps d'execució de les funcions. Finalment s'implementa una altra solució basada en el model serverless per a executar la fase d'inferència de models d'aprenentatge automàtic prèviament entrenats en la plataforma de Amazon Web Services i en una plataforma privada amb el framework OSCAR. El sistema creix elàsticament d'acord amb la demanda i presenta una escalada a zero per a minimitzar els costos. D'altra banda el front-end proporciona a l'usuari una experiència simplificada en l'obtenció de la predicció de models d'aprenentatge automàtic. Per a demostrar les funcionalitats i avantatges de les solucions proposades durant esta tesi s'arrepleguen diversos casos d'estudi que comprenen diferents camps del coneixement com l'analítica d'aprenentatge i la Intel·ligència Artificial. Això demostra que la gamma d'aplicacions on la computació serverless pot aportar grans beneficis és molt àmplia. Els resultats obtinguts avalen l'ús del model serverless en la simplificació del disseny d'arquitectures per a l'ús intensiu de dades en aplicacions complexes.[EN] With the development of Cloud Computing, the delivery of virtualized resources over the Internet has greatly grown in recent years. Functions as a Service (FaaS), one of the newest service models within Cloud Computing, allows the development and implementation of event-based applications that cover managed services in public and on-premises Clouds. Public Cloud Computing providers adopt the FaaS model within their catalog to provide event-driven highly-scalable computing for applications. On the one hand, developers specialized in this technology focus on creating open-source serverless frameworks to avoid the lock-in with public Cloud providers. Despite the development achieved by serverless computing, there are currently fields related to data processing and execution performance optimization where the full potential has not been explored. In this doctoral thesis three serverless computing strategies are defined that allow to demonstrate the benefits of this technology for data processing. The implemented strategies allow the analysis of data with the integration of accelerated devices for the efficient execution of scientific applications on public and on-premises Cloud platforms. Firstly, the CloudTrail-Tracker platform was developed to extract and process learning analytics in the Cloud. CloudTrail-Tracker is an event-driven open-source platform for serverless data processing that can automatically scale up and down, featuring the ability to scale to zero for minimizing the operational costs. Next, the integration of GPUs in an event-driven on-premises serverless platform for scalable data processing is discussed. The platform supports the execution of applications as serverless functions in response to the loading of a file in a file storage system, which allows the parallel execution of applications according to available resources. This processing is managed by an elastic Kubernetes cluster that automatically grows and shrinks according to the processing needs. Certain approaches based on GPU virtualization technologies such as rCUDA and NVIDIA-Docker are evaluated to speed up the execution time of the functions. Finally, another solution based on the serverless model is implemented to run the inference phase of previously trained machine learning models on theAmazon Web Services platform and in a private platform with the OSCAR framework. The system grows elastically according to demand and is scaled to zero to minimize costs. On the other hand, the front-end provides the user with a simplified experience in obtaining the prediction of machine learning models. To demonstrate the functionalities and advantages of the solutions proposed during this thesis, several case studies are collected covering different fields of knowledge such as learning analytics and Artificial Intelligence. This shows the wide range of applications where serverless computing can bring great benefits. The results obtained endorse the use of the serverless model in simplifying the design of architectures for the intensive data processing in complex applications.Naranjo Delgado, DM. (2021). Serverless Computing Strategies on Cloud Platforms [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/160916TESI

    GPU Computing for Cognitive Robotics

    Get PDF
    This thesis presents the first investigation of the impact of GPU computing on cognitive robotics by providing a series of novel experiments in the area of action and language acquisition in humanoid robots and computer vision. Cognitive robotics is concerned with endowing robots with high-level cognitive capabilities to enable the achievement of complex goals in complex environments. Reaching the ultimate goal of developing cognitive robots will require tremendous amounts of computational power, which was until recently provided mostly by standard CPU processors. CPU cores are optimised for serial code execution at the expense of parallel execution, which renders them relatively inefficient when it comes to high-performance computing applications. The ever-increasing market demand for high-performance, real-time 3D graphics has evolved the GPU into a highly parallel, multithreaded, many-core processor extraordinary computational power and very high memory bandwidth. These vast computational resources of modern GPUs can now be used by the most of the cognitive robotics models as they tend to be inherently parallel. Various interesting and insightful cognitive models were developed and addressed important scientific questions concerning action-language acquisition and computer vision. While they have provided us with important scientific insights, their complexity and application has not improved much over the last years. The experimental tasks as well as the scale of these models are often minimised to avoid excessive training times that grow exponentially with the number of neurons and the training data. This impedes further progress and development of complex neurocontrollers that would be able to take the cognitive robotics research a step closer to reaching the ultimate goal of creating intelligent machines. This thesis presents several cases where the application of the GPU computing on cognitive robotics algorithms resulted in the development of large-scale neurocontrollers of previously unseen complexity enabling the conducting of the novel experiments described herein.European Commission Seventh Framework Programm

    Accessible software frameworks for reproducible image analysis of host-pathogen interactions

    Get PDF
    Um die Mechanismen hinter lebensgefährlichen Krankheiten zu verstehen, müssen die zugrundeliegenden Interaktionen zwischen den Wirtszellen und krankheitserregenden Mikroorganismen bekannt sein. Die kontinuierlichen Verbesserungen in bildgebenden Verfahren und Computertechnologien ermöglichen die Anwendung von Methoden aus der bildbasierten Systembiologie, welche moderne Computeralgorithmen benutzt um das Verhalten von Zellen, Geweben oder ganzen Organen präzise zu messen. Um den Standards des digitalen Managements von Forschungsdaten zu genügen, müssen Algorithmen den FAIR-Prinzipien (Findability, Accessibility, Interoperability, and Reusability) entsprechen und zur Verbreitung ebenjener in der wissenschaftlichen Gemeinschaft beitragen. Dies ist insbesondere wichtig für interdisziplinäre Teams bestehend aus Experimentatoren und Informatikern, in denen Computerprogramme zur Verbesserung der Kommunikation und schnellerer Adaption von neuen Technologien beitragen können. In dieser Arbeit wurden daher Software-Frameworks entwickelt, welche dazu beitragen die FAIR-Prinzipien durch die Entwicklung von standardisierten, reproduzierbaren, hochperformanten, und leicht zugänglichen Softwarepaketen zur Quantifizierung von Interaktionen in biologischen System zu verbreiten. Zusammenfassend zeigt diese Arbeit wie Software-Frameworks zu der Charakterisierung von Interaktionen zwischen Wirtszellen und Pathogenen beitragen können, indem der Entwurf und die Anwendung von quantitativen und FAIR-kompatiblen Bildanalyseprogrammen vereinfacht werden. Diese Verbesserungen erleichtern zukünftige Kollaborationen mit Lebenswissenschaftlern und Medizinern, was nach dem Prinzip der bildbasierten Systembiologie zur Entwicklung von neuen Experimenten, Bildgebungsverfahren, Algorithmen, und Computermodellen führen wird
    corecore