20 research outputs found
Deploying AI Frameworks on Secure HPC Systems with Containers
The increasing interest in the usage of Artificial Intelligence techniques
(AI) from the research community and industry to tackle "real world" problems,
requires High Performance Computing (HPC) resources to efficiently compute and
scale complex algorithms across thousands of nodes. Unfortunately, typical data
scientists are not familiar with the unique requirements and characteristics of
HPC environments. They usually develop their applications with high-level
scripting languages or frameworks such as TensorFlow and the installation
process often requires connection to external systems to download open source
software during the build. HPC environments, on the other hand, are often based
on closed source applications that incorporate parallel and distributed
computing API's such as MPI and OpenMP, while users have restricted
administrator privileges, and face security restrictions such as not allowing
access to external systems. In this paper we discuss the issues associated with
the deployment of AI frameworks in a secure HPC environment and how we
successfully deploy AI frameworks on SuperMUC-NG with Charliecloud.Comment: 6 pages, 2 figures, 2019 IEEE High Performance Extreme Computing
Conferenc
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and
platforms, and ubiquitous connectivity demand new approaches to computation in
the sciences. These new approaches must enable computation to be mobile, so
that, for example, it can occur near data, be triggered by events (e.g.,
arrival of new data), be offloaded to specialized accelerators, or run remotely
where resources are available. They also require new design approaches in which
monolithic applications can be decomposed into smaller components, that may in
turn be executed separately and on the most suitable resources. To address
these needs we present funcX---a distributed function as a service (FaaS)
platform that enables flexible, scalable, and high performance remote function
execution. funcX's endpoint software can transform existing clouds, clusters,
and supercomputers into function serving systems, while funcX's cloud-hosted
service provides transparent, secure, and reliable function execution across a
federated ecosystem of endpoints. We motivate the need for funcX with several
scientific case studies, present our prototype design and implementation, show
optimizations that deliver throughput in excess of 1 million functions per
second, and demonstrate, via experiments on two supercomputers, that funcX can
scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap
with arXiv:1908.0490
Recommended from our members
Containerization on Petascale HPC Clusters
Containerization technologies provide a mechanism to encapsulate applications and many of their dependencies, facilitating software portability and reproducibility on HPC systems. However, in order to access many of the architectural features that enable HPC system performance, compatibility between certain components of the container and host are required, resulting in a trade-off between portability and performance. In this work, we discuss our early experiences running three state-of-the-art containerization technologies on the petascale Frontera system. We present how we build the containers to ensure performance and security and their performance at scale.We ran microbenchmarks at a scale of 4,096 nodes and demonstrate the near-native performance and minimal memory overheads by the containerized environments at 70,000 processes on 1,296 nodes with a scientific application MILC - a quantum chromodynamics code.UT Austin-Portugal Program, a collaboration between the Portuguese Foundation of Science and Technology and the University of Texas at Austin, award UTA18-001217Texas Advanced Computing Center (TACC
Heterogeneity, High Performance Computing, Self-Organization and the Cloud
application; blueprints; self-management; self-organisation; resource management; supply chain; big data; PaaS; Saas; HPCaa
Heterogeneity, High Performance Computing, Self-Organization and the Cloud
application; blueprints; self-management; self-organisation; resource management; supply chain; big data; PaaS; Saas; HPCaa
Heterogeneity, high performance computing, self-organization and the Cloud
This open access book addresses the most recent developments in cloud computing such as HPC in the Cloud, heterogeneous cloud, self-organising and self-management, and discusses the business implications of cloud computing adoption. Establishing the need for a new architecture for cloud computing, it discusses a novel cloud management and delivery architecture based on the principles of self-organisation and self-management. This focus shifts the deployment and optimisation effort from the consumer to the software stack running on the cloud infrastructure. It also outlines validation challenges and introduces a novel generalised extensible simulation framework to illustrate the effectiveness, performance and scalability of self-organising and self-managing delivery models on hyperscale cloud infrastructures. It concludes with a number of potential use cases for self-organising, self-managing clouds and the impact on those businesses
Intelligent Computing: The Latest Advances, Challenges and Future
Computing is a critical driving force in the development of human
civilization. In recent years, we have witnessed the emergence of intelligent
computing, a new computing paradigm that is reshaping traditional computing and
promoting digital revolution in the era of big data, artificial intelligence
and internet-of-things with new computing theories, architectures, methods,
systems, and applications. Intelligent computing has greatly broadened the
scope of computing, extending it from traditional computing on data to
increasingly diverse computing paradigms such as perceptual intelligence,
cognitive intelligence, autonomous intelligence, and human-computer fusion
intelligence. Intelligence and computing have undergone paths of different
evolution and development for a long time but have become increasingly
intertwined in recent years: intelligent computing is not only
intelligence-oriented but also intelligence-driven. Such cross-fertilization
has prompted the emergence and rapid advancement of intelligent computing.
Intelligent computing is still in its infancy and an abundance of innovations
in the theories, systems, and applications of intelligent computing are
expected to occur soon. We present the first comprehensive survey of literature
on intelligent computing, covering its theory fundamentals, the technological
fusion of intelligence and computing, important applications, challenges, and
future perspectives. We believe that this survey is highly timely and will
provide a comprehensive reference and cast valuable insights into intelligent
computing for academic and industrial researchers and practitioners