240 research outputs found

    tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

    Full text link
    Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and allows instrumentation of TensorFlow operations. However, the current profiler only enables analysis at the TensorFlow platform level and does not provide system-level information. In this paper, we extend TensorFlow Profiler and introduce tf-Darshan, both a profiler and tracer, that performs instrumentation through Darshan. We use the same Darshan shared instrumentation library and implement a runtime attachment without using a system preload. We can extract Darshan profiling data structures during TensorFlow execution to enable analysis through the TensorFlow profiler. We visualize the performance results through TensorBoard, the web-based TensorFlow visualization tool. At the same time, we do not alter Darshan's existing implementation. We illustrate tf-Darshan by performing two case studies on ImageNet image and Malware classification. We show that by guiding optimization using data from tf-Darshan, we increase POSIX I/O bandwidth by up to 19% by selecting data for staging on fast tier storage. We also show that Darshan has the potential of being used as a runtime library for profiling and providing information for future optimization.Comment: Accepted for publication at the 2020 International Conference on Cluster Computing (CLUSTER 2020

    Deployment and Operation of Complex Software in Heterogeneous Execution Environments

    Get PDF
    This open access book provides an overview of the work developed within the SODALITE project, which aims at facilitating the deployment and operation of distributed software on top of heterogeneous infrastructures, including cloud, HPC and edge resources. The experts participating in the project describe how SODALITE works and how it can be exploited by end users. While multiple languages and tools are available in the literature to support DevOps teams in the automation of deployment and operation steps, still these activities require specific know-how and skills that cannot be found in average teams. The SODALITE framework tackles this problem by offering modelling and smart editing features to allow those we call Application Ops Experts to work without knowing low level details about the adopted, potentially heterogeneous, infrastructures. The framework offers also mechanisms to verify the quality of the defined models, generate the corresponding executable infrastructural code, automatically wrap application components within proper execution containers, orchestrate all activities concerned with deployment and operation of all system components, and support on-the-fly self-adaptation and refactoring

    Digital transformation of peatland eco-innovations (‘Paludiculture’): Enabling a paradigm shift towards the real-time sustainable production of ‘green-friendly’ products and services

    Get PDF
    The world is heading in the wrong direction on carbon emissions where we are not on track to limit global warming to 1.5 degrees C; Ireland is among the countries where overall emissions have continued to rise. The development of wettable peatland products and services (termed 'Paludiculture') present significant opportunities for enabling a transition away from peat-harvesting (fossil fuels) to developing 'green' eco-innovations. However, this must be balanced with sustainable carbon sequestration and environmental protection. This complex transition from 'brown to green' must be met in real time by enabling digital technologies across the full value chain. This will potentially necessitate creation of new green-business models with the potential to support disruptive innovation. This timely paper describes digital transformation of paludiculture-based eco-innovation that will potentially lead to a paradigm shift towards using smart digital technologies to address efficiency of products and services along with future-proofing for climate change. Digital transform of paludiculture also aligns with the 'Industry 5.0 -a human-centric solution'. However, companies supporting peatland innovation may lack necessary standards, data-sharing or capabilities that can also affect viable business model propositions that can jeopardize economic, political and social sustainability. Digital solutions may reduce costs, increase productivity, improve produce develop, and achieve faster time to market for paludiculture. Digitisation also enables information systems to be open, interoperable, and user-friendly. This constitutes the first study to describe the digital transformation of paludiculture, both vertically and horizontally, in order to inform sustainability that includes process automation via AI, machine learning, IoT-Cloud informed sensors and robotics, virtual and augmented reality, and blockchain for cyber-physical systems. Thus, the aim of this paper is to describe the applicability of digital transformation to actualize the benefits and opportunities of paludiculture activities and enterprises in the Irish midlands with a global orientation.info:eu-repo/semantics/publishedVersio

    Generic Metadata Handling in Scientific Data Life Cycles

    Get PDF
    Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management

    Bridging a Gap Between Research and Production: Contributions to Scheduling and Simulation

    Get PDF
    Large scale distributed computing infrastructures (e.g., data centers, grids, or clouds) are used by scientists from various domains to produce outstanding research results, such as the discovery of the Higgs Boson in High Energy Physics. These infrastructures are also studied by Computer Scientists to produce their own set of scientific results. Ideally, a virtuous circle should exist between Domain and Computer Scientists: the former raising challenges that could be addressed by the latter. Unfortunately, in many occasions, a gap exists that prevents such an ideal and fostering collaboration. This habilitation covers research works conducted in the fields of scheduling and simulation that contribute to the filling of this gap. It discusses the necessary conditions to achieve this goal and details concrete initiatives in this endeavor

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    On Improving Efficiency of Data-Intensive Applications in Geo-Distributed Environments

    Get PDF
    Distributed systems are pervasively demanded and adopted in nowadays for processing data-intensive workloads since they greatly accelerate large-scale data processing with scalable parallelism and improved data locality. Traditional distributed systems initially targeted computing clusters but have since evolved to data centers with multiple clusters. These systems are mostly built on top of homogeneous, tightly integrated resources connected in high-speed local-area networks (LANs), and typically require data to be ingested to a central data center for processing. Today, with enormous volumes of data continuously generated from geographically distributed locations, direct adoption of such systems is prohibitively inefficient due to the limited system scalability and high cost for centralizing the geo-distributed data over the wide-area networks (WANs). More commonly, it becomes a trend to build geo-distributed systems wherein data processing jobs are performed on top of geo-distributed, heterogeneous resources in proximity to the data at vastly distributed geo-locations. However, critical challenges and mechanisms for efficient execution of data-intensive applications in such geo-distributed environments are unclear by far. The goal of this dissertation is to identify such challenges and mechanisms, by extensively using the research principles and methodology of conventional distributed systems to investigate the geo-distributed environment, and by developing new techniques to tackle these challenges and run data-intensive applications with efficiency at scale. The contributions of this dissertation are threefold. Firstly, the dissertation shows that the high level of resource heterogeneity exhibited in the geo-distributed environment undermines the scalability of geo-distributed systems. Virtualization-based resource abstraction mechanisms have been introduced to abstract the hardware, network, and OS resources throughout the system, to mitigate the underlying resource heterogeneity and enhance the system scalability. Secondly, the dissertation reveals the overwhelming performance and monetary cost incurred by indulgent data sharing over the WANs in geo-distributed systems. Network optimization approaches, including linear- programming-based global optimization, greedy bin-packing heuristics, and TCP enhancement, are developed to optimize the network resource utilization and circumvent unnecessary expenses imposed on data sharing in WANs. Lastly, the dissertation highlights the importance of data locality for data-intensive applications running in the geo-distributed environment. Novel data caching and locality-aware scheduling techniques are devised to improve the data locality.Doctor of Philosoph
    corecore