1,771 research outputs found
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
On Elastic Language Models
Large-scale pretrained language models have achieved compelling performance
in a wide range of language understanding and information retrieval tasks.
Knowledge distillation offers an opportunity to compress a large language model
to a small one, in order to reach a reasonable latency-performance tradeoff.
However, for scenarios where the number of requests (e.g., queries submitted to
a search engine) is highly variant, the static tradeoff attained by the
compressed language model might not always fit. Once a model is assigned with a
static tradeoff, it could be inadequate in that the latency is too high when
the number of requests is large or the performance is too low when the number
of requests is small. To this end, we propose an elastic language model
(ElasticLM) that elastically adjusts the tradeoff according to the request
stream. The basic idea is to introduce a compute elasticity to the compressed
language model, so that the tradeoff could vary on-the-fly along scalable and
controllable compute. Specifically, we impose an elastic structure to enable
ElasticLM with compute elasticity and design an elastic optimization to learn
ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic
schedule. Considering the specificity of information retrieval, we adapt
ElasticLM to dense retrieval and reranking and present ElasticDenser and
ElasticRanker respectively. Offline evaluation is conducted on a language
understanding benchmark GLUE; and several information retrieval tasks including
Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM
along with ElasticDenser and ElasticRanker can perform correctly and
competitively compared with an array of static baselines. Furthermore, online
simulation with concurrency is also carried out. The results demonstrate that
ElasticLM can provide elastic tradeoffs with respect to varying request stream.Comment: 27 pages, 11 figures, 9 table
Enabling Distributed Applications Optimization in Cloud Environment
The past few years have seen dramatic growth in the popularity of public clouds, such as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Container-as-a-Service (CaaS). In both commercial and scientific fields, quick environment setup and application deployment become a mandatory requirement. As a result, more and more organizations choose cloud environments instead of setting up the environment by themselves from scratch. The cloud computing resources such as server engines, orchestration, and the underlying server resources are served to the users as a service from a cloud provider. Most of the applications that run in public clouds are the distributed applications, also called multi-tier applications, which require a set of servers, a service ensemble, that cooperate and communicate to jointly provide a certain service or accomplish a task. Moreover, a few research efforts are conducting in providing an overall solution for distributed applications optimization in the public cloud.
In this dissertation, we present three systems that enable distributed applications optimization: (1) the first part introduces DocMan, a toolset for detecting containerized application’s dependencies in CaaS clouds, (2) the second part introduces a system to deal with hot/cold blocks in distributed applications, (3) the third part introduces a system named FP4S, a novel fragment-based parallel state recovery mechanism that can handle many simultaneous failures for a large number of concurrently running stream applications
LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing
LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
- …