8 research outputs found

    Data handover on a peer-to-peer system

    Get PDF
    This paper presents the DHO API and its integration into apeer-to-peer Grid architecture. It provides an efficientmanagement of critical data resources in an extensible distributedsetting consisting of a set of peers that may join or leave the system.Locking and mapping of such a resource are handled transparently forusers: they may access them through simple function calls.On the lowest level of the proposed architecture the Exclusive Locksfor Mobile Processes ELMP algorithm ensures data consistencyand guarantees the logical order of requests.All operations in our architecture have an amortized cost of O(logn). An experimental assessment validates the practicality of ourproposal

    Platform to Assist Medical Experts in Training, Application, and Control of Machine Learning Models Using Patient Data from a Clinical Information System

    Get PDF
    In recent years, clinical data scientists achieved major breakthroughs advancing machine learning models for the medical domain, which have great potential assisting medical experts. Machine learning models can be leveraged to assist medical experts in tasks such as analyzing and diagnosing patient data, for example, from computed tomography scans. However, it is a challenge to translate the latest advancements in academic research fields such as computer sciences and physics into clinical practice. For this purpose, clinical data scientists and medical experts need to closely collaborate. This thesis tackles challenges of accessibility and usability of state-of-the-art machine learning models as well as designing a scalable computing architecture. Hence, conceptual ideas of possible strategies, as well as a prototype of such a machine learning platform, are presented. A systematic literature review was conducted on the current approaches to create medical machine learning platforms, the management of machine learning models, and the version management of large data sets. Afterward, the functional and nonfunctional requirements of the new machine learning platform were elicited as part of the requirements analysis. Two streamlined workflows for clinical data scientists and medical experts were derived from the requirement analysis. The workflow for the clinical data scientists includes steps to define, train, and share machine learning methods, including pre- and postprocessing modules, and management of data sets. Medical experts are able to analyze patient data using pre-defined machine learning methods. Building on the result of these analyses, the architecture of the platform was derived. The architecture consists of a scalable infrastructure stack, a lightweight and easy-to-use web interface, as well as a backend component to provide the required functionalities. The final design decisions solve the issue of efficiently standardizing, parallelizing, and applying machine learning workflows within a scalable computing infrastructure. The proposed platform was evaluated with 22 participants, consisting of clinical data scientists (N=12) and medical experts (N=10). Both groups were asked to rate specific workflows of the platform, as well as the platform as a whole, and to provide additional ideas and feedback. 92% of the medical experts and 90% of the clinical data scientists rated their overall impression of the platform as very good. Furthermore, medical experts and clinical data scientists strongly agreed that the platform facilitates method development and collaborations with 92% and 90%, respectively. The conducted expert survey suggests that the here proposed platform could be used to develop, optimize, and apply machine learning methods in the medical domain and beyond, thereby easing the collaboration between medical experts and clinical data scientists

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    Generic access to symbolic computing services

    Get PDF
    Symbolic computation is one of the computational domains that requires large computational resources. Computer Algebra Systems (CAS), the main tools used for symbolic computations, are mainly designed to be used as software tools installed on standalone machines that do not provide the required resources for solving large symbolic computation problems. In order to support symbolic computations an infrastructure built upon massively distributed computational environments must be developed. Building an infrastructure for symbolic computations requires a thorough analysis of the most important requirements raised by the symbolic computation world and must be built based on the most suitable architectural styles and technologies. The architecture that we propose is composed of several main components: the Computer Algebra System (CAS) Server that exposes the functionality implemented by one or more supporting CASs through generic interfaces of Grid Services; the Architecture for Grid Symbolic Services Orchestration (AGSSO) Server that allows seamless composition of CAS Server capabilities; and client side libraries to assist the users in describing workflows for symbolic computations directly within the CAS environment. We have also designed and developed a framework for automatic data management of mathematical content that relies on OpenMath encoding. To support the validation and fine tuning of the system we have developed a simulation platform that mimics the environment on which the architecture is deployed

    Efficient multilevel scheduling in grids and clouds with dynamic provisioning

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 12-01-2016La consolidación de las grandes infraestructuras para la Computación Distribuida ha resultado en una plataforma de Computación de Alta Productividad que está lista para grandes cargas de trabajo. Los mejores exponentes de este proceso son las federaciones grid actuales. Por otro lado, la Computación Cloud promete ser más flexible, utilizable, disponible y simple que la Computación Grid, cubriendo además muchas más necesidades computacionales que las requeridas para llevar a cabo cálculos distribuidos. En cualquier caso, debido al dinamismo y la heterogeneidad presente en grids y clouds, encontrar la asignación ideal de las tareas computacionales en los recursos disponibles es, por definición un problema NP-completo, y sólo se pueden encontrar soluciones subóptimas para estos entornos. Sin embargo, la caracterización de estos recursos en ambos tipos de infraestructuras es deficitaria. Los sistemas de información disponibles no proporcionan datos fiables sobre el estado de los recursos, lo cual no permite la planificación avanzada que necesitan los diferentes tipos de aplicaciones distribuidas. Durante la última década esta cuestión no ha sido resuelta para la Computación Grid y las infraestructuras cloud establecidas recientemente presentan el mismo problema. En este marco, los planificadores (brokers) sólo pueden mejorar la productividad de las ejecuciones largas, pero no proporcionan ninguna estimación de su duración. La planificación compleja ha sido abordada tradicionalmente por otras herramientas como los gestores de flujos de trabajo, los auto-planificadores o los sistemas de gestión de producción pertenecientes a ciertas comunidades de investigación. Sin embargo, el bajo rendimiento obtenido con estos mecanismos de asignación anticipada (early-binding) es notorio. Además, la diversidad en los proveedores cloud, la falta de soporte de herramientas de planificación y de interfaces de programación estandarizadas para distribuir la carga de trabajo, dificultan la portabilidad masiva de aplicaciones legadas a los entornos cloud...The consolidation of large Distributed Computing infrastructures has resulted in a High-Throughput Computing platform that is ready for high loads, whose best proponents are the current grid federations. On the other hand, Cloud Computing promises to be more flexible, usable, available and simple than Grid Computing, covering also much more computational needs than the ones required to carry out distributed calculations. In any case, because of the dynamism and heterogeneity that are present in grids and clouds, calculating the best match between computational tasks and resources in an effectively characterised infrastructure is, by definition, an NP-complete problem, and only sub-optimal solutions (schedules) can be found for these environments. Nevertheless, the characterisation of the resources of both kinds of infrastructures is far from being achieved. The available information systems do not provide accurate data about the status of the resources that can allow the advanced scheduling required by the different needs of distributed applications. The issue was not solved during the last decade for grids and the cloud infrastructures recently established have the same problem. In this framework, brokers only can improve the throughput of very long calculations, but do not provide estimations of their duration. Complex scheduling was traditionally tackled by other tools such as workflow managers, self-schedulers and the production management systems of certain research communities. Nevertheless, the low performance achieved by these earlybinding methods is noticeable. Moreover, the diversity of cloud providers and mainly, their lack of standardised programming interfaces and brokering tools to distribute the workload, hinder the massive portability of legacy applications to cloud environments...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEsubmitte

    Applications Development for the Computational Grid

    Get PDF

    Standardized multi-protocol data management for grid and cloud GridRPC frameworks

    Get PDF
    International audienceGridRPC is an international standard of the Open Grid Forum defining an API designed to allow applications to be submitted in a seamless way on large scale, heterogeneous and geographically distributed computing platforms. First versions of the standard did not take into account any data management feature. Data were parameters of the Remote Procedure calls, without any possibility to prefetch them, to use persistence, replication, external sources, etc. , and making GridRPC codes middleware dependent. The data extension of the standard introduced a short set of functions and data structures to complete the API with simple but powerful data management features. In this paper, we present a modular and extensible implementation of both APIs, which needs only a few developments to be usable with any middleware relying on RPC, and which provides access to numerous and easy to extend protocols and data middleware to access data. Gaining data management functions, it introduces interesting potentiality for optimization that such an approach would provide to large scale applications
    corecore