23 research outputs found

    Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

    Get PDF
    Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.This work was supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Grant TIN2016-79637-P(toward Unification of HPC and Big Data Paradigms), in part by the Spanish Ministry of Education under Grant FPU15/00422 TrainingProgram for Academic and Teaching Staff Grant, in part by the Advanced Scientific Computing Research, Office of Science, U.S.Department of Energy, under Contract DE-AC02-06CH11357, and in part by the DOE with under Agreement DE-DC000122495,Program Manager Laura Biven

    A cloudification methodology for high performance simulations

    Get PDF
    Mención Internacional en el título de doctorMany scientific areas make extensive use of computer simulations to study complex real-world processes. These computations are typically very resource-intensive and present scalability issues as experiments get larger, even in dedicated supercomputers since they are limited by their own hardware resources. Cloud computing raises as an option to move forward into the ideal unlimited scalability by providing virtually infinite resources, yet applications must be adapted to this paradigm. The major goal of this thesis is to analyze the suitability of performing simulations in clouds by performing a paradigm shift, from classic parallel approaches to data-centric models, in those applications where that is possible. The aim is to maintain the scalability achieved in traditional HPC infrastructures, while taking advantage of Cloud Computing paradigm features. The thesis also explores the characteristics that make simulators suitable or unsuitable to be deployed on HPC or Cloud infrastructures, defining a generic architecture and extracting common elements present among the majority of simulators. As result, we propose a generalist cloudification methodology based on the MapReduce paradigm to migrate high performance simulations into the cloud to provide greater scalability. We analysed its viability by applying it to a real engineering simulator and running the resulting implementation on HPC and cloud environments. Our evaluations will aim to show that the cloudified application is highly scalable and there is still a large margin to improve the theoretical model and its implementations, and also to extend it to a wider range of simulations.Muchas áreas de investigación hacen uso extensivo de simulaciones informáticas para estudiar procesos complejos del mundo real. Estas simulaciones suelen hacer uso intensivo de recursos, y presentan problemas de escalabilidad conforme los experimentos aumentan en tamaño incluso en clústeres, ya que estos están limitados por sus propios recursos hardware. Cloud Computing (computación en la nube) surge como alternativa para avanzar hacia el ideal de escalabilidad ilimitada mediante el aprovisionamiento de infinitos recursos (de forma virtual). No obstante, las aplicaciones deben ser adaptadas a este nuevo paradigma. La principal meta de esta tesis es analizar la idoneidad de realizar simulaciones en la nube mediante un cambio de paradigma, de las clásicas aproximaciones paralelas a nuevos modelos centrados en los datos, en aquellas aplicaciones donde esto sea posible. El objetivo es mantener la escalabilidad alcanzada en las tradicionales infraestructuras HPC, mientras se explotan las ventajas del paradigma de computación en la nube. La tesis explora las características que hacen a los simuladores ser o no adecuados para ser desplegados en infraestructuras clúster o en la nube, definiendo una arquitectura genérica y extrayendo elementos comunes presentes en la mayoría de los simuladores. Como resultado, proponemos una metodología genérica de cloudificación, basada en el paradigma MapReduce, para migrar simulaciones de alto rendimiento a la nube con el fin de proveer mayor escalabilidad. Analizamos su viabilidad aplicándola a un simulador real de ingeniería, y ejecutando la implementación resultante en entornos clúster y en la nube. Nuestras evaluaciones pretenden mostrar que la aplicación cloudificada es altamente escalable, y que existe un amplio margen para mejorar el modelo teórico y sus implementaciones, y para extenderlo a un rango más amplio de simulaciones.- Administrador de Infraestructuras Ferroviarias (ADIF), Estudio y realización de programas de cálculo de pórticos rígidos de catenaria (CALPOR) y de sistema de simulación de montaje de agujas aéreas de línea aérea de contacto (SIA), JM/RS 3.6/4100.0685-9/00100 – Administrador de Infraestructuras Ferroviarias (ADIF), Proyecto para la Investigación sobre la aplicación de las TIC a la innovación de las diferentes infraestructuras correspondientes a las instalaciones de electrificación y suministro de energía (SIRTE), JM/RS 3.9/1500.0009/0-00000 – Spanish Ministry of Education, TIN2010-16497, Scalable Input/Output techniques for high-performance distributed and parallel computing environments – Spanish Ministry of Economics and Competitiveness, TIN2013-41350-P, Técnicas de gestión escalable de datos para high-end computing systems – European Union, COST Action IC1305, ”Network for Sustainable Ultrascale Computing Platforms” (NESUS) – European Union, COST Action IC0805, ”Open European Network for High Performance Computing on Complex Environments” – Spanish Ministry of Economics and Competitiveness, TIN2011-15734-E, Red de Computación de Altas Prestaciones sobre Arquitecturas Paralelas Heterogéneas (CAPAP-H)Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Domenica Talia.- Presidente: José Daniel García Sánchez.- Secretario: José Manuel Moya Fernánde

    Browser-based Data Annotation, Active Learning, and Real-Time Distribution of Artificial Intelligence Models: From Tumor Tissue Microarrays to COVID-19 Radiology.

    Get PDF
    BACKGROUND: Artificial intelligence (AI) is fast becoming the tool of choice for scalable and reliable analysis of medical images. However, constraints in sharing medical data outside the institutional or geographical space, as well as difficulties in getting AI models and modeling platforms to work across different environments, have led to a "reproducibility crisis" in digital medicine. METHODS: This study details the implementation of a web platform that can be used to mitigate these challenges by orchestrating a digital pathology AI pipeline, from raw data to model inference, entirely on the local machine. We discuss how this federated platform provides governed access to data by consuming the Application Program Interfaces exposed by cloud storage services, allows the addition of user-defined annotations, facilitates active learning for training models iteratively, and provides model inference computed directly in the web browser at practically zero cost. The latter is of particular relevance to clinical workflows because the code, including the AI model, travels to the user's data, which stays private to the governance domain where it was acquired. RESULTS: We demonstrate that the web browser can be a means of democratizing AI and advancing data socialization in medical imaging backed by consumer-facing cloud infrastructure such as Box.com. As a case study, we test the accompanying platform end-to-end on a large dataset of digital breast cancer tissue microarray core images. We also showcase how it can be applied in contexts separate from digital pathology by applying it to a radiology dataset containing COVID-19 computed tomography images. CONCLUSIONS: The platform described in this report resolves the challenges to the findable, accessible, interoperable, reusable stewardship of data and AI models by integrating with cloud storage to maintain user-centric governance over the data. It also enables distributed, federated computation for AI inference over those data and proves the viability of client-side AI in medical imaging. AVAILABILITY: The open-source application is publicly available at , with a short video demonstration at

    Myths and Legends in High-Performance Computing

    Full text link
    In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We gathered these myths from conversations at conferences and meetings, product advertisements, papers, and other communications such as tweets, blogs, and news articles within and beyond our community. We believe they represent the zeitgeist of the current era of massive change, driven by the end of many scaling laws such as Dennard scaling and Moore's law. While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research. Nevertheless, these myths are rarely based on scientific facts, but rather on some evidence or argumentation. In fact, we believe that this is the very reason for the existence of many myths and why they cannot be answered clearly. While it feels like there should be clear answers for each, some may remain endless philosophical debates, such as whether Beethoven was better than Mozart. We would like to see our collection of myths as a discussion of possible new directions for research and industry investment

    Rise of the Planet of Serverless Computing: A Systematic Review

    Get PDF
    Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications. It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164 papers on 17 research directions of serverless computing, including performance optimization, programming framework, application migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms for serverless computing, as well as promising research opportunities

    Machine learning as a service for high energy physics (MLaaS4HEP): a service for ML-based data analyses

    Get PDF
    With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services

    Software defined wireless network (sdwn) for industrial environment: case of underground mine

    Get PDF
    Avec le développement continu des industries minières canadiennes, l’établissement des réseaux de communications souterrains avancés et sans fil est devenu un élément essentiel du processus industriel minier et ceci pour améliorer la productivité et assurer la communication entre les mineurs. Cette étude vise à proposer un système de communication minier en procurant une architecture SDWN (Software Defined Wireless Network) basée sur la technologie de communication LTE. Dans cette étude, les plateformes les plus importantes de réseau mobile 4G ont été étudiées, configurées et testées dans deux zones différentes : un tunnel de mine souterrain et un couloir intérieur étroit. Également, une architecture mobile combinant SDWN et NFV (Network Functions Virtualization) a été réalisée

    Fog Computing in IoT Smart Environments via Named Data Networking: A Study on Service Orchestration Mechanisms

    Get PDF
    [EN] By offering low-latency and context-aware services, fog computing will have a peculiar role in the deployment of Internet of Things (IoT) applications for smart environments. Unlike the conventional remote cloud, for which consolidated architectures and deployment options exist, many design and implementation aspects remain open when considering the latest fog computing paradigm. In this paper, we focus on the problems of dynamically discovering the processing and storage resources distributed among fog nodes and, accordingly, orchestrating them for the provisioning of IoT services for smart environments. In particular, we show how these functionalities can be effectively supported by the revolutionary Named Data Networking (NDN) paradigm. Originally conceived to support named content delivery, NDN can be extended to request and provide named computation services, with NDN nodes acting as both content routers and in-network service executors. To substantiate our analysis, we present an NDN fog computing framework with focus on a smart campus scenario, where the execution of IoT services is dynamically orchestrated and performed by NDN nodes in a distributed fashion. A simulation campaign in ndnSIM, the reference network simulator of the NDN research community, is also presented to assess the performance of our proposal against state-of-the-art solutions. Results confirm the superiority of the proposal in terms of service provisioning time, paid at the expenses of a slightly higher amount of traffic exchanged among fog nodes.This research was partially funded by the Italian Government under grant PON ARS01_00836 for the COGITO (A COGnItive dynamic sysTem to allOw buildings to learn and adapt) PON Project.Amadeo, M.; Ruggeri, G.; Campolo, C.; Molinaro, A.; Loscri, V.; Tavares De Araujo Cesariny Calafate, CM. (2019). Fog Computing in IoT Smart Environments via Named Data Networking: A Study on Service Orchestration Mechanisms. Future Internet. 11(11):1-21. https://doi.org/10.3390/fi11110222S1211111Lee, I., & Lee, K. (2015). The Internet of Things (IoT): Applications, investments, and challenges for enterprises. Business Horizons, 58(4), 431-440. doi:10.1016/j.bushor.2015.03.008Cicirelli, F., Guerrieri, A., Spezzano, G., Vinci, A., Briante, O., Iera, A., & Ruggeri, G. (2018). Edge Computing and Social Internet of Things for Large-Scale Smart Environments Development. IEEE Internet of Things Journal, 5(4), 2557-2571. doi:10.1109/jiot.2017.2775739Chiang, M., & Zhang, T. (2016). Fog and IoT: An Overview of Research Opportunities. IEEE Internet of Things Journal, 3(6), 854-864. doi:10.1109/jiot.2016.2584538Openfog Consortiumhttp://www.openfogconsortium.org/Zhang, L., Afanasyev, A., Burke, J., Jacobson, V., claffy, kc, Crowley, P., … Zhang, B. (2014). Named data networking. ACM SIGCOMM Computer Communication Review, 44(3), 66-73. doi:10.1145/2656877.2656887Amadeo, M., Ruggeri, G., Campolo, C., & Molinaro, A. (2019). IoT Services Allocation at the Edge via Named Data Networking: From Optimal Bounds to Practical Design. IEEE Transactions on Network and Service Management, 16(2), 661-674. doi:10.1109/tnsm.2019.2900274ndnSIM 2.0: A New Version of the NDN Simulator for NS-3https://www.researchgate.net/profile/Spyridon_Mastorakis/publication/281652451_ndnSIM_20_A_new_version_of_the_NDN_simulator_for_NS-3/links/5b196020a6fdcca67b63660d/ndnSIM-20-A-new-version-of-the-NDN-simulator-for-NS-3.pdfAhlgren, B., Dannewitz, C., Imbrenda, C., Kutscher, D., & Ohlman, B. (2012). A survey of information-centric networking. IEEE Communications Magazine, 50(7), 26-36. doi:10.1109/mcom.2012.6231276NFD Developer’s Guidehttps://named-data.net/wp-content/uploads/2016/03/ndn-0021-diff-5..6-nfd-developer-guide.pdfPiro, G., Amadeo, M., Boggia, G., Campolo, C., Grieco, L. A., Molinaro, A., & Ruggeri, G. (2019). Gazing into the Crystal Ball: When the Future Internet Meets the Mobile Clouds. IEEE Transactions on Cloud Computing, 7(1), 210-223. doi:10.1109/tcc.2016.2573307Zhang, G., Li, Y., & Lin, T. (2013). Caching in information centric networking: A survey. Computer Networks, 57(16), 3128-3141. doi:10.1016/j.comnet.2013.07.007Yi, C., Afanasyev, A., Moiseenko, I., Wang, L., Zhang, B., & Zhang, L. (2013). A case for stateful forwarding plane. Computer Communications, 36(7), 779-791. doi:10.1016/j.comcom.2013.01.005Amadeo, M., Briante, O., Campolo, C., Molinaro, A., & Ruggeri, G. (2016). Information-centric networking for M2M communications: Design and deployment. Computer Communications, 89-90, 105-116. doi:10.1016/j.comcom.2016.03.009Tourani, R., Misra, S., Mick, T., & Panwar, G. (2018). Security, Privacy, and Access Control in Information-Centric Networking: A Survey. IEEE Communications Surveys & Tutorials, 20(1), 566-600. doi:10.1109/comst.2017.2749508Ndn-ace: Access Control for Constrained Environments over Named Data Networkinghttp://new.named-data.net/wp-content/uploads/2015/12/ndn-0036-1-ndn-ace.pdfZhang, Z., Yu, Y., Zhang, H., Newberry, E., Mastorakis, S., Li, Y., … Zhang, L. (2018). An Overview of Security Support in Named Data Networking. IEEE Communications Magazine, 56(11), 62-68. doi:10.1109/mcom.2018.1701147Cisco White Paperhttps://www.cisco.com/c/dam/en_us/solutions/trends/iot/docs/computing-overview.pdfAazam, M., Zeadally, S., & Harras, K. A. (2018). Deploying Fog Computing in Industrial Internet of Things and Industry 4.0. IEEE Transactions on Industrial Informatics, 14(10), 4674-4682. doi:10.1109/tii.2018.2855198Hou, X., Li, Y., Chen, M., Wu, D., Jin, D., & Chen, S. (2016). Vehicular Fog Computing: A Viewpoint of Vehicles as the Infrastructures. IEEE Transactions on Vehicular Technology, 65(6), 3860-3873. doi:10.1109/tvt.2016.2532863Yousefpour, A., Fung, C., Nguyen, T., Kadiyala, K., Jalali, F., Niakanlahiji, A., … Jue, J. P. (2019). All one needs to know about fog computing and related edge computing paradigms: A complete survey. Journal of Systems Architecture, 98, 289-330. doi:10.1016/j.sysarc.2019.02.009Baktir, A. C., Ozgovde, A., & Ersoy, C. (2017). How Can Edge Computing Benefit From Software-Defined Networking: A Survey, Use Cases, and Future Directions. IEEE Communications Surveys & Tutorials, 19(4), 2359-2391. doi:10.1109/comst.2017.2717482Duan, Q., Yan, Y., & Vasilakos, A. V. (2012). A Survey on Service-Oriented Network Virtualization Toward Convergence of Networking and Cloud Computing. IEEE Transactions on Network and Service Management, 9(4), 373-392. doi:10.1109/tnsm.2012.113012.120310Amadeo, M., Campolo, C., & Molinaro, A. (2016). NDNe: Enhancing Named Data Networking to Support Cloudification at the Edge. IEEE Communications Letters, 20(11), 2264-2267. doi:10.1109/lcomm.2016.2597850Krol, M., Marxer, C., Grewe, D., Psaras, I., & Tschudin, C. (2018). Open Security Issues for Edge Named Function Environments. IEEE Communications Magazine, 56(11), 69-75. doi:10.1109/mcom.2018.170111711801-2:2017 Information Technology—Generic Cabling for Customer Premiseshttps://www.iso.org/standard/66183.htm

    5G-PPP Software Network Working Group:Network Applications: Opening up 5G and beyond networks 5G-PPP projects analysis, Version 2

    Get PDF
    It is expected that the communication fabric and the way network services are consumed will evolve towards 6G, building on and extending capabilities of 5G and Beyond networks. Service APIs, Operation APIs, Network APIs are different aspects of the network exposure, which provides the communication service providers a way to monetize the network capabilities. Allowing the developer community to use network capabilities via APIs is an emerging area for network monetization. Thus, it is important that network exposure caters for the needs of developers serving different markets, e.g., different vertical industry segments. The concept of “Network Applications” is introduced following this idea. It is defined as a set of services that provides certain functionalities to verticals and their associated use cases. The Network Applications is more than the introduction of new vertical applications that have interaction capabilities. It refers to the need for a separate middleware layer to simplify the implementation and deployment of vertical systems on a large scale. Specifically, third parties or network operators can contribute to Network Applications, depending on the level of interaction and trust. In practice, a Network Application uses the exposed APIs from the network and can either be integrated with (part of) a vertical application or expose its APIs (e.g., service APIs) for further consumption by vertical applications. This paper builds on the findings of the white paper released in 2022. It targets to go into details about the implementations of the two major Network Applications class: “aaS” and hybrid models. It introduces the Network Applications marketplace and put the light on technological solution like CAMARA project, as part of the standard landscape. <br/
    corecore