34 research outputs found
RISC-V-Based Platforms for HPC: Analyzing Non-functional Properties for Future HPC and Big-Data Clusters
High-Performance Computing (HPC) have evolved to be used to perform simulations of systems where physical experimentation is prohibitively impractical, expensive, or dangerous. This paper provides a general overview and showcases the analysis of non-functional properties
in RISC-V-based platforms for HPCs. In particular, our analyses target the evaluation of power and energy control, thermal management, and reliability assessment of promising systems, structures, and technologies devised for current and future generation of HPC machines. The main set of design methodologies and technologies developed within the activities of the Future and HPC & Big Data spoke of the National Centre of HPC, Big Data and Quantum Computing project are described along with the description of the testbed for experimenting two-phase cooling approaches
Anomaly Detection and Anticipation in High Performance Computing Systems
In their quest toward Exascale, High Performance Computing (HPC) systems are rapidly becoming larger and more complex, together with the issues concerning their maintenance. Luckily, many current HPC systems are endowed with data monitoring infrastructures that characterize the system state, and whose data can be used to train Deep Learning (DL) anomaly detection models, a very popular research area. However, the lack of labels describing the state of the system is a wide-spread issue, as annotating data is a costly task, generally falling on human system administrators and thus does not scale toward exascale. In this article we investigate the possibility to extract labels from a service monitoring tool (Nagios) currently used by HPC system administrators to flag the nodes which undergo maintenance operations. This allows to automatically annotate data collected by a fine-grained monitoring infrastructure; this labelled data is then used to train and validate a DL model for anomaly detection. We conduct the experimental evaluation on a tier-0 production supercomputer hosted at CINECA, Bologna, Italy. The results reveal that the DL model can accurately detect the real failures, and, moreover, it can predict the insurgency of anomalies, by systematically anticipating the actual labels (i.e., the moment when system administrators realize when an anomalous event happened); the average advance time computed on historical traces is around 45 minutes. The proposed technology can be easily scaled toward exascale systems to easy their maintenance
ExaMon-X: a Predictive Maintenance Framework for Automatic Monitoring in Industrial IoT Systems
In recent years, the Industrial Internet of Things (IIoT) has led to significant steps forward in many industries, thanks to the exploitation of several technologies, ranging from Big Data processing to Artificial Intelligence (AI). Among the various IIoT scenarios, large-scale data centers can reap significant benefits from adopting Big Data analytics and AI-boosted approaches since these technologies can allow effective predictive maintenance. However, most of the off-the-shelf currently available solutions are not ideally suited to the HPC context, e.g., they do not sufficiently take into account the very heterogeneous data sources and the privacy issues which hinder the adoption of the cloud solution, or they do not fully
exploit the computing capabilities available in loco in a supercomputing facility. In this paper, we tackle this issue, and we propose an IIoT holistic and vertical framework for predictive maintenance in supercomputers. The framework is based on a big lightweight data monitoring infrastructure, specialized databases suited for heterogeneous data, and a set of high-level AI-based functionalities tailored to HPC actors’ specific needs. We present the deployment and assess the usage of this framework in several in-production HPC systems
Sustainable development in contract logistics through green warehousing and distribution. Practical case: Maersk warehouse
The thesis is dedicated to the staple part of contract logistics – warehousing and distribution. The nature of this study is to show the sustainable development of (green) warehousing, the crucial changes and achievements in the sector and how these improvements impact the environment and society. The author appeals to the research and analytical methods where he investigates the sources and related documents to analyse the global situation in contract logistics and describe the current situation of green warehousing. The practical method helped to study the Maersk warehouse and to show if it indeed responds to the requirements of sustainable durability as well as highlights the transformations conducting the company towards decarbonisation. The first part dedicated to the general research of warehousing and distribution. The author investigated the main features of warehouse location and its crucial importance, studied different types of layouts, showed the modernisation and application of WMS facilitating the operation’s running, examined the efficiency of equipment usage, light and air conditioning systems and analysed the social impact of warehousing for sustainability. The second chapter observes the practical case of Maersk warehouse as the first logistic centre for the company in Iberia area. The author not only investigated the main features of warehouse, but also showed how the company is implementing the sustainable tools to reduce the environmental impact as well as gave at the glace the future of warehousing via innovation and technology usage
Progressive introduction of network softwarization in operational telecom networks: advances at architectural, service and transport levels
Technological paradigms such as Software Defined Networking, Network Function
Virtualization and Network Slicing are altogether offering new ways of providing services.
This process is widely known as Network Softwarization, where traditional operational
networks adopt capabilities and mechanisms inherit form the computing world, such as
programmability, virtualization and multi-tenancy.
This adoption brings a number of challenges, both from the technological and operational
perspectives. On the other hand, they provide an unprecedented flexibility opening
opportunities to developing new services and new ways of exploiting and consuming telecom
networks.
This Thesis first overviews the implications of the progressive introduction of network
softwarization in operational networks for later on detail some advances at different levels,
namely architectural, service and transport levels. It is done through specific exemplary use
cases and evolution scenarios, with the goal of illustrating both new possibilities and existing
gaps for the ongoing transition towards an advanced future mode of operation.
This is performed from the perspective of a telecom operator, paying special attention on
how to integrate all these paradigms into operational networks for assisting on their evolution
targeting new, more sophisticated service demands.Programa de Doctorado en IngenierĂa Telemática por la Universidad Carlos III de MadridPresidente: Eduardo Juan Jacob Taquet.- Secretario: Francisco Valera Pintor.- Vocal: Jorge LĂłpez VizcaĂn
Milestones in Autonomous Driving and Intelligent Vehicles Part \uppercase\expandafter{\romannumeral1}: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing
at a rapid pace due to the convenience, safety, and economic benefits. Although
a number of surveys have reviewed research achievements in this field, they are
still limited in specific tasks and lack systematic summaries and research
directions in the future. Our work is divided into 3 independent articles and
the first part is a Survey of Surveys (SoS) for total technologies of AD and
IVs that involves the history, summarizes the milestones, and provides the
perspectives, ethics, and future research directions. This is the second part
(Part \uppercase\expandafter{\romannumeral1} for this technical survey) to
review the development of control, computing system design, communication, High
Definition map (HD map), testing, and human behaviors in IVs. In addition, the
third part (Part \uppercase\expandafter{\romannumeral2} for this technical
survey) is to review the perception and planning sections. The objective of
this paper is to involve all the sections of AD, summarize the latest technical
milestones, and guide abecedarians to quickly understand the development of AD
and IVs. Combining the SoS and Part \uppercase\expandafter{\romannumeral2}, we
anticipate that this work will bring novel and diverse insights to researchers
and abecedarians, and serve as a bridge between past and future.Comment: 18 pages, 4 figures, 3 table
pAElla: Edge-AI based Real-Time Malware Detection in Data Centers
The increasing use of Internet-of-Things (IoT) devices for monitoring a wide
spectrum of applications, along with the challenges of "big data" streaming
support they often require for data analysis, is nowadays pushing for an
increased attention to the emerging edge computing paradigm. In particular,
smart approaches to manage and analyze data directly on the network edge, are
more and more investigated, and Artificial Intelligence (AI) powered edge
computing is envisaged to be a promising direction. In this paper, we focus on
Data Centers (DCs) and Supercomputers (SCs), where a new generation of
high-resolution monitoring systems is being deployed, opening new opportunities
for analysis like anomaly detection and security, but introducing new
challenges for handling the vast amount of data it produces. In detail, we
report on a novel lightweight and scalable approach to increase the security of
DCs/SCs, that involves AI-powered edge computing on high-resolution power
consumption. The method -- called pAElla -- targets real-time Malware Detection
(MD), it runs on an out-of-band IoT-based monitoring system for DCs/SCs, and
involves Power Spectral Density of power measurements, along with AutoEncoders.
Results are promising, with an F1-score close to 1, and a False Alarm and
Malware Miss rate close to 0%. We compare our method with State-of-the-Art MD
techniques and show that, in the context of DCs/SCs, pAElla can cover a wider
range of malware, significantly outperforming SoA approaches in terms of
accuracy. Moreover, we propose a methodology for online training suitable for
DCs/SCs in production, and release open dataset and code
A study into scalable transport networks for IoT deployment
The growth of the internet towards the Internet of Things (IoT) has impacted the way we live. Intelligent (smart) devices which can act autonomously has resulted in new applications for example industrial automation, smart healthcare systems, autonomous transportation to name just a few. These applications have dramatically improved the way we live as citizens. While the internet is continuing to grow at an unprecedented rate, this has also been coupled with the growing demands for new services e.g. machine-to machine (M2M) communications, smart metering etc. Transmission Control Protocol/Internet Protocol (TCP/IP) architecture was developed decades ago and was not prepared nor designed to meet these exponential demands. This has led to the complexity of the internet coupled with its inflexible and a rigid state. The challenges of reliability, scalability, interoperability, inflexibility and vendor lock-in amongst the many challenges still remain a concern over the existing (traditional) networks. In this study, an evolutionary approach into implementing a "Scalable IoT Data Transmission Network" (S-IoT-N) is proposed while leveraging on existing transport networks. Most Importantly, the proposed evolutionary approach attempts to address the above challenges by using open (existing) standards and by leveraging on the (traditional/existing) transport networks. The Proof-of-Concept (PoC) of the proposed S-IoT-N is attempted on a physical network testbed and is demonstrated along with basic network connectivity services over it. Finally, the results are validated by an experimental performance evaluation of the PoC physical network testbed along with the recommendations for improvement and future work