418 research outputs found
Network monitoring in public clouds: issues, methodologies, and applications
Cloud computing adoption is rapidly growing thanks to the carried large technical and economical advantages.
Its effects can be observed also looking at the fast increase of cloud traffic: in accordance with recent forecasts,
more than 75\% of the overall datacenter traffic
will be cloud traffic by 2018.
Accordingly,
huge investments have been made by providers in network infrastructures.
Networks of geographically distributed datacenters have been built,
which require efficient and accurate monitoring activities to be operated.
However, providers rarely expose information about the state of cloud networks or their design,
and seldom make promises about their performance.
In this scenario,
cloud customers therefore have to cope with performance unpredictability
in spite of the primary role played by the network.
Indeed, according to the deployment practices adopted
and the functional separation of the application layers often implemented,
the network heavily influences the performance of the cloud services,
also impacting costs and revenues.
In this thesis
cloud networks are investigated
enforcing non-cooperative approaches,
i.e.~that do not require access to any information restricted to entities involved in the cloud service provision.
A platform to monitor cloud networks from the point of view of the customer is presented.
Such a platform enables general customers---even those with limited expertise in the configuration and the management of cloud resources---to obtain valuable information about the state of the cloud network, according to a set of factors under their control.
A detailed characterization of the cloud network and of its performance is provided,
thanks to extensive experimentations performed during the last years
on the infrastructures of the two leading cloud providers
(Amazon Web Services and Microsoft Azure).
The information base gathered by enforcing the proposed approaches allows
customers to better understand the characteristics of these complex network infrastructures.
Moreover, experimental results are also useful to the provider for understanding the quality of service perceived by customers.
By properly interpreting the obtained results, usage guidelines can be devised
which allow to enhance the achievable performance and reduce costs.
As a particular case study, the thesis also shows how monitoring information can be leveraged by the customer
to implement convenient mechanisms to scale cloud resources
without any a priori knowledge.
More in general, we believe that this thesis provides a better-defined picture of the characteristics
of the complex cloud network infrastructures,
also providing the scientific community with useful tools for characterizing them in the future
Edge computing infrastructure for 5G networks: a placement optimization solution
This thesis focuses on how to optimize the placement of the Edge Computing infrastructure for upcoming 5G networks. To this aim, the core contributions of this research are twofold: 1) a novel heuristic called Hybrid Simulated Annealing to tackle the NP-hard nature of the problem and, 2) a framework called EdgeON providing a practical tool for real-life deployment optimization.
In more detail, Edge Computing has grown into a key solution to 5G latency, reliability and scalability requirements. By bringing computing, storage and networking resources to the edge of the network, delay-sensitive applications, location-aware systems and upcoming real-time services leverage the benefits of a reduced physical and logical path between the end-user and the data or service host.
Nevertheless, the edge node placement problem raises critical concerns regarding deployment and operational expenditures (i.e., mainly due to the number of nodes to be deployed), current backhaul network capabilities and non-technical placement limitations. Common approaches to the placement of edge nodes are based on: Mobile Edge Computing (MEC), where the processing capabilities are deployed at the Radio Access Network nodes and Facility Location Problem variations, where a simplistic cost function is used to determine where to optimally place the infrastructure. However, these methods typically lack the flexibility to be used for edge node placement under the strict technical requirements identified for 5G networks. They fail to place resources at the network edge for 5G ultra-dense networking environments in a network-aware manner.
This doctoral thesis focuses on rigorously defining the Edge Node Placement Problem (ENPP) for 5G use cases and proposes a novel framework called EdgeON aiming at reducing the overall expenses when deploying and operating an Edge Computing network, taking into account the usage and characteristics of the in-place backhaul network and the strict requirements of a 5G-EC ecosystem. The developed framework implements several placement and optimization strategies thoroughly assessing its suitability to solve the network-aware ENPP. The core of the framework is an in-house developed heuristic called Hybrid Simulated Annealing (HSA), seeking to address the high complexity of the ENPP while avoiding the non-convergent behavior of other traditional heuristics (i.e., when applied to similar problems).
The findings of this work validate our approach to solve the network-aware ENPP, the effectiveness of the heuristic proposed and the overall applicability of EdgeON. Thorough performance evaluations were conducted on the core placement solutions implemented revealing the superiority of HSA when compared to widely used heuristics and common edge placement approaches (i.e., a MEC-based strategy). Furthermore, the practicality of EdgeON was tested through two main case studies placing services and virtual network functions over the previously optimally placed edge nodes.
Overall, our proposal is an easy-to-use, effective and fully extensible tool that can be used by operators seeking to optimize the placement of computing, storage and networking infrastructure at the users’ vicinity. Therefore, our main contributions not only set strong foundations towards a cost-effective deployment and operation of an Edge Computing network, but directly impact the feasibility of upcoming 5G services/use cases and the extensive existing research regarding the placement of services and even network service chains at the edge
Change Management Systems for Seamless Evolution in Data Centers
Revenue for data centers today is highly dependent on the satisfaction of their enterprise customers. These customers often require various features to migrate their businesses and operations to the cloud. Thus, clouds today introduce new features at a swift pace to onboard new customers and to meet the needs of existing ones. This pace of innovation continues to grow on super linearly, e.g., Amazon deployed 1400 new features in 2017. However, such a rapid pace of evolution adds complexities both for users and the cloud. Clouds struggle to keep up with the deployment speed, and users struggle to learn which features they need and how to use them. The pace of these evolutions has brought us to a tipping point: we can no longer use rules of thumb to deploy new features, and customers need help to identify which features they need. We have built two systems: Janus and Cherrypick, to address these problems. Janus helps data center operators roll out new changes to the data center network. It automatically adapts to the data center topology, routing, traffic, and failure settings. The system reduces the risk of new deployments for network operators as they can now pick deployment strategies which are less likely to impact users’ performance. Cherrypick finds near-optimal cloud configurations for big data analytics. It adapts allows users to search through the new machine types the clouds are constantly introducing and find ones with a near-optimal performance that meets their budget. Cherrypick can adapt to new big-data frameworks and applications as well as the new machine types the clouds are constantly introducing. As the pace of cloud innovations increases, it is critical to have tools that allow operators to deploy new changes as well as those that would enable users to adapt to achieve good performance at low cost. The tools and algorithms discussed in this thesis help accomplish these goals
The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science
We present the Open MatSci ML Toolkit: a flexible, self-contained, and
scalable Python-based framework to apply deep learning models and methods on
scientific data with a specific focus on materials science and the OpenCatalyst
Dataset. Our toolkit provides: 1. A scalable machine learning workflow for
materials science leveraging PyTorch Lightning, which enables seamless scaling
across different computation capabilities (laptop, server, cluster) and
hardware platforms (CPU, GPU, XPU). 2. Deep Graph Library (DGL) support for
rapid graph neural network prototyping and development. By publishing and
sharing this toolkit with the research community via open-source release, we
hope to: 1. Lower the entry barrier for new machine learning researchers and
practitioners that want to get started with the OpenCatalyst dataset, which
presently comprises the largest computational materials science dataset. 2.
Enable the scientific community to apply advanced machine learning tools to
high-impact scientific challenges, such as modeling of materials behavior for
clean energy applications. We demonstrate the capabilities of our framework by
enabling three new equivariant neural network models for multiple OpenCatalyst
tasks and arrive at promising results for compute scaling and model
performance.Comment: Paper accompanying Open-Source Software from
https://github.com/IntelLabs/matscim
Recommended from our members
Source-Routed Multicast Schemes for Large-Scale Cloud Data Center Networks
Data centers (DCs) have been witnessing unprecedented growth in size, number and complexity in recent years. They consist of tens of thousands of servers interconnected by fast network switches, hosting and enabling numerous applications with various traffic characteristics and requirements. As a result, DC networks have been presented with several unique challenges, pertaining to the scaling and allocation of network resources during the forwarding and moving of data across the different DC servers. Traffic routing in general and multicast routing in particular are important functions in DC networks, especially that modern cloud DCs tend to exhibit one-to-many communication traffic patterns. Unfortunately, recent multicast routing approaches that adopt IP multicast suffer from scalability and load balancing issues, and do not scale well with the number of supported multicast groups when used for cloud DC networks. In this thesis, we propose a set of new, complementary schemes that overcome these challenges. More specifically, firstly, we study existing DC network topologies, and propose Circulant Fat-Tree topology, an improvement over the traditional Fat-Tree topology with better properties to suit nowadays DC networks. Then, we review and classify recent studies that investigate and measure the traffic behavior of operational DC networks. We focus on the way they collect the traffic as well as on the key findings made in these studies.
Secondly, we propose Bert, a source-initiated multicast routing scheme for DCs. Bert scales well with both the number and the size of multicast groups, and does so through clustering, by dividing the members of the multicast group into a set of clusters with each cluster employing its own forwarding rules. In essence, Bert yields much lesser multicast traffic overhead than state-of-the-art schemes.
Thirdly, we propose, Ernie, a scalable and load-balanced multicast source routing scheme. Ernie introduces a novel method for scaling out the number of supported mul- ticast groups. In particular, it appropriately constructs and organizes multicast header information inside packets in a manner that allows core/root switches to only forward down the needed information. Ernie also introduces an effective multicast traffic load balancing technique across downstream links. Specifically, it prudently assigns multicast groups to core switches to ensure the evenness of load distribution across the downstream links
- …