882 research outputs found
Generating and auto-tuning parallel stencil codes
In this thesis, we present a software framework, Patus, which generates high performance stencil codes for different types of hardware platforms, including current multicore CPU and graphics processing unit architectures. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform.
A stencil computation updates every grid point in a structured grid based on the values of its neighboring points. This class of computations occurs frequently in scientific and general purpose computing (e.g., in partial differential equation solvers or in image processing), justifying the focus on this kind of computation.
The proposed key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology.
The Patus stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific
code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different
hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance.
Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning — which essentially amounts to solving an integer programming problem in which the objective function is the number representing the code's performance as a function of the parameter configuration, — the system can also be used more productively than if the programmer had to fine-tune the code manually.
We show performance results for a variety of stencils, for which Patus was used to generate the corresponding implementations. The selection includes stencils taken from two real-world applications: a simulation of the temperature within the human body during hyperthermia cancer treatment and a seismic application. These examples demonstrate the framework's flexibility and ability to produce high performance code
Data-Driven Methods for Data Center Operations Support
During the last decade, cloud technologies have been evolving at
an impressive pace, such that we are now living in a cloud-native
era where developers can leverage on an unprecedented landscape
of (possibly managed) services for orchestration, compute, storage,
load-balancing, monitoring, etc. The possibility to have on-demand
access to a diverse set of configurable virtualized resources allows
for building more elastic, flexible and highly-resilient distributed
applications. Behind the scenes, cloud providers sustain the heavy
burden of maintaining the underlying infrastructures, consisting in
large-scale distributed systems, partitioned and replicated among
many geographically dislocated data centers to guarantee scalability,
robustness to failures, high availability and low latency. The larger the
scale, the more cloud providers have to deal with complex interactions
among the various components, such that monitoring, diagnosing and
troubleshooting issues become incredibly daunting tasks.
To keep up with these challenges, development and operations
practices have undergone significant transformations, especially in
terms of improving the automations that make releasing new software,
and responding to unforeseen issues, faster and sustainable at scale.
The resulting paradigm is nowadays referred to as DevOps. However,
while such automations can be very sophisticated, traditional DevOps
practices fundamentally rely on reactive mechanisms, that typically
require careful manual tuning and supervision from human experts.
To minimize the risk of outages—and the related costs—it is crucial to
provide DevOps teams with suitable tools that can enable a proactive
approach to data center operations.
This work presents a comprehensive data-driven framework to address
the most relevant problems that can be experienced in large-scale
distributed cloud infrastructures. These environments are indeed characterized
by a very large availability of diverse data, collected at each
level of the stack, such as: time-series (e.g., physical host measurements,
virtual machine or container metrics, networking components
logs, application KPIs); graphs (e.g., network topologies, fault graphs
reporting dependencies among hardware and software components,
performance issues propagation networks); and text (e.g., source code,
system logs, version control system history, code review feedbacks).
Such data are also typically updated with relatively high frequency,
and subject to distribution drifts caused by continuous configuration
changes to the underlying infrastructure. In such a highly dynamic scenario,
traditional model-driven approaches alone may be inadequate
at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust
data-driven methods to support their decisions based on historical
information. For instance, effective anomaly detection capabilities may
also help in conducting more precise and efficient root-cause analysis.
Also, leveraging on accurate forecasting and intelligent control
strategies would improve resource management.
Given their ability to deal with high-dimensional, complex data,
Deep Learning-based methods are the most straightforward option for
the realization of the aforementioned support tools. On the other hand,
because of their complexity, this kind of models often requires huge
processing power, and suitable hardware, to be operated effectively
at scale. These aspects must be carefully addressed when applying
such methods in the context of data center operations. Automated
operations approaches must be dependable and cost-efficient, not to
degrade the services they are built to improve.
i
Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices
Running machine learning algorithms (ML) on constrained devices at the extreme edge of the network is problematic due to the computational overhead of ML algorithms, available resources on the embedded platform, and application budget (i.e., real-time requirements, power constraints, etc.). This required the development of specific solutions and development tools for what is now referred to as TinyML. In this dissertation, we focus on improving the deployment and performance of TinyML applications, taking into consideration the aforementioned challenges, especially memory requirements.
This dissertation contributed to the construction of the Edge Learning Machine environment (ELM), a platform-independent open-source framework that provides three main TinyML services, namely shallow ML, self-supervised ML, and binary deep learning on constrained devices. In this context, this work includes the following steps, which are reflected in the thesis structure. First, we present the performance analysis of state-of-the-art shallow ML algorithms including dense neural networks, implemented on mainstream microcontrollers. The comprehensive analysis in terms of algorithms, hardware platforms, datasets, preprocessing techniques, and configurations shows similar performance results compared to a desktop machine and highlights the impact of these factors on overall performance. Second, despite the assumption that TinyML only permits models inference provided by the scarcity of resources, we have gone a step further and enabled self-supervised on-device training on microcontrollers and tiny IoT devices by developing the Autonomous Edge Pipeline (AEP) system. AEP achieves comparable accuracy compared to the typical TinyML paradigm, i.e., models trained on resource-abundant devices and then deployed on microcontrollers. Next, we present the development of a memory allocation strategy for convolutional neural networks (CNNs) layers, that optimizes memory requirements. This approach reduces the memory footprint without affecting accuracy nor latency. Moreover, e-skin systems share the main requirements of the TinyML fields: enabling intelligence with low memory, low power consumption, and low latency. Therefore, we designed an efficient Tiny CNN architecture for e-skin applications. The architecture leverages the memory allocation strategy presented earlier and provides better performance than existing solutions. A major contribution of the thesis is given by CBin-NN, a library of functions for implementing extremely efficient binary neural networks on constrained devices. The library outperforms state of the art NN deployment solutions by drastically reducing memory footprint and inference latency. All the solutions proposed in this thesis have been implemented on representative devices and tested in relevant applications, of which results are reported and discussed. The ELM framework is open source, and this work is clearly becoming a useful, versatile toolkit for the IoT and TinyML research and development community
Co-simulation techniques based on virtual platforms for SoC design and verification in power electronics applications
En las últimas décadas, la inversión en el ámbito energético ha aumentado considerablemente. Actualmente, existen numerosas empresas que están desarrollando equipos como convertidores de potencia o máquinas eléctricas con sistemas de control de última generación. La tendencia actual es usar System-on-chips y Field Programmable Gate Arrays para implementar todo el sistema de control. Estos dispositivos facilitan el uso de algoritmos de control más complejos y eficientes, mejorando la eficiencia de los equipos y habilitando la integración de los sistemas renovables en la red eléctrica. Sin embargo, la complejidad de los sistemas de control también ha aumentado considerablemente y con ello la dificultad de su verificación.
Los sistemas Hardware-in-the-loop (HIL) se han presentado como una solución para la verificación no destructiva de los equipos energéticos, evitando accidentes y pruebas de alto coste en bancos de ensayo. Los sistemas HIL simulan en tiempo real el comportamiento de la planta de potencia y su interfaz para realizar las pruebas con la placa de control en un entorno seguro.
Esta tesis se centra en mejorar el proceso de verificación de los sistemas de control en aplicaciones de electrónica potencia. La contribución general es proporcionar una alternativa a al uso de los HIL para la verificación del hardware/software de la tarjeta de control. La alternativa se basa en la técnica de Software-in-the-loop (SIL) y trata de superar o abordar las limitaciones encontradas hasta la fecha en el SIL.
Para mejorar las cualidades de SIL se ha desarrollado una herramienta software denominada COSIL que permite co-simular la implementación e integración final del sistema de control, sea software (CPU), hardware (FPGA) o una mezcla de software y hardware, al mismo tiempo que su interacción con la planta de potencia. Dicha plataforma puede trabajar en múltiples niveles de abstracción e incluye soporte para realizar co-simulación mixtas en distintos lenguajes como C o VHDL.
A lo largo de la tesis se hace hincapié en mejorar una de las limitaciones de SIL, su baja velocidad de simulación. Se proponen diferentes soluciones como el uso de emuladores software, distintos niveles de abstracción del software y hardware, o relojes locales en los módulos de la FPGA. En especial se aporta un mecanismo de sincronizaron externa para el emulador software QEMU habilitando su emulación multi-core. Esta aportación habilita el uso de QEMU en plataformas virtuales de co-simulacion como COSIL.
Toda la plataforma COSIL, incluido el uso de QEMU, se ha analizado bajo diferentes tipos de aplicaciones y bajo un proyecto industrial real. Su uso ha sido crítico para desarrollar y verificar el software y hardware del sistema de control de un convertidor de 400 kVA
Neuro-Fuzzy Based Intelligent Approaches to Nonlinear System Identification and Forecasting
Nearly three decades back nonlinear system identification consisted of several ad-hoc approaches, which were restricted to a very limited class of systems. However, with the advent of the various soft computing methodologies like neural networks and the fuzzy logic combined with optimization techniques, a wider class of systems can be handled at present. Complex systems may be of diverse characteristics and nature. These systems may be linear or nonlinear, continuous or discrete, time varying or time invariant, static or dynamic, short term or long term, central or distributed, predictable or unpredictable, ill or well defined. Neurofuzzy hybrid modelling approaches have been developed as an ideal technique for utilising linguistic values and numerical data. This Thesis is focused on the development of advanced neurofuzzy modelling architectures and their application to real case studies. Three potential requirements have been identified as desirable characteristics for such design: A model needs to have minimum number of rules; a model needs to be generic acting either as Multi-Input-Single-Output (MISO) or Multi-Input-Multi-Output (MIMO) identification model; a model needs to have a versatile nonlinear membership function.
Initially, a MIMO Adaptive Fuzzy Logic System (AFLS) model which incorporates a prototype defuzzification scheme, while utilising an efficient, compared to the Takagi–Sugeno–Kang (TSK) based systems, fuzzification layer has been developed for the detection of meat spoilage using Fourier transform infrared (FTIR) spectroscopy. The identification strategy involved not only the classification of beef fillet samples in their respective quality class (i.e. fresh, semi-fresh and spoiled), but also the simultaneous prediction of their associated microbiological population directly from FTIR spectra. In the case of AFLS, the number of memberships for each input variable was directly associated to the number of rules, hence, the “curse of dimensionality” problem was significantly reduced. Results confirmed the advantage of the proposed scheme against Adaptive Neurofuzzy Inference System (ANFIS), Multilayer Perceptron (MLP) and Partial Least Squares (PLS) techniques used in the same case study.
In the case of MISO systems, the TSK based structure, has been utilized in many neurofuzzy systems, like ANFIS. At the next stage of research, an Adaptive Fuzzy Inference Neural
Network (AFINN) has been developed for the monitoring the spoilage of minced beef utilising multispectral imaging information. This model, which follows the TSK structure,
incorporates a clustering pre-processing stage for the definition of fuzzy rules, while its final fuzzy rule base is determined by competitive learning. In this specific case study, AFINN model was also able to predict for the first time in the literature, the beef’s temperature directly from imaging information. Results again proved the superiority of the adopted model. By extending the line of research and adopting specific design concepts from the previous case studies, the Asymmetric Gaussian Fuzzy Inference Neural Network (AGFINN) architecture has been developed. This architecture has been designed based on the above design principles. A clustering preprocessing scheme has been applied to minimise the number of fuzzy rules. AGFINN incorporates features from the AFLS concept, by having the
same number of rules as well as fuzzy memberships. In spite of the extensive use of the standard symmetric Gaussian membership functions, AGFINN utilizes an asymmetric
function acting as input linguistic node. Since the asymmetric Gaussian membership function’s variability and flexibility are higher than the traditional one, it can partition the input space more effectively. AGFINN can be built either as an MISO or as an MIMO system. In the MISO case, a TSK defuzzification scheme has been implemented, while two different learning algorithms have been implemented. AGFINN has been tested on real datasets related to electricity price forecasting for the ISO New England Power Distribution System. Its performance was compared against a number of alternative models, including ANFIS, AFLS, MLP and Wavelet Neural Network (WNN), and proved to be superior. The concept of asymmetric functions proved to be a valid hypothesis and certainly it can find application to other architectures, such as in Fuzzy Wavelet Neural Network models, by designing a suitable flexible wavelet membership function. AGFINN’s MIMO characteristics also make the proposed architecture suitable for a larger range of applications/problems
NASA SBIR abstracts of 1991 phase 1 projects
The objectives of 301 projects placed under contract by the Small Business Innovation Research (SBIR) program of the National Aeronautics and Space Administration (NASA) are described. These projects were selected competitively from among proposals submitted to NASA in response to the 1991 SBIR Program Solicitation. The basic document consists of edited, non-proprietary abstracts of the winning proposals submitted by small businesses. The abstracts are presented under the 15 technical topics within which Phase 1 proposals were solicited. Each project was assigned a sequential identifying number from 001 to 301, in order of its appearance in the body of the report. Appendixes to provide additional information about the SBIR program and permit cross-reference of the 1991 Phase 1 projects by company name, location by state, principal investigator, NASA Field Center responsible for management of each project, and NASA contract number are included
Fast algorithm for real-time rings reconstruction
The GAP project is dedicated to study the application of GPU in several contexts in which
real-time response is important to take decisions. The definition of real-time depends on
the application under study, ranging from answer time of μs up to several hours in case
of very computing intensive task. During this conference we presented our work in low
level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and
specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6].
Apart from the study of dedicated solution to decrease the latency due to data transport
and preparation, the computing algorithms play an essential role in any GPU application.
In this contribution, we show an original algorithm developed for triggers application, to
accelerate the ring reconstruction in RICH detector when it is not possible to have seeds
for reconstruction from external trackers
12th EASN International Conference on "Innovation in Aviation & Space for opening New Horizons"
Epoxy resins show a combination of thermal stability, good mechanical performance, and durability, which make these materials suitable for many applications in the Aerospace industry. Different types of curing agents can be utilized for curing epoxy systems. The use of aliphatic amines as curing agent is preferable over the toxic aromatic ones, though their incorporation increases the flammability of the resin. Recently, we have developed different hybrid strategies, where the sol-gel technique has been exploited in combination with two DOPO-based flame retardants and other synergists or the use of humic acid and ammonium polyphosphate to achieve non-dripping V-0 classification in UL 94 vertical flame spread tests, with low phosphorous loadings (e.g., 1-2 wt%). These strategies improved the flame retardancy of the epoxy matrix, without any detrimental impact on the mechanical and thermal properties of the composites. Finally, the formation of a hybrid silica-epoxy network accounted for the establishment of tailored interphases, due to a better dispersion of more polar additives in the hydrophobic resin
- …