107 research outputs found
DESIGN AND ANALYSIS OF LINEAR AND NONLINEAR FILTERS FOR THE FDI OF AIRCRAFT MODEL SENSORS
Increasing demands on reliability for safety critical systems such as aircraft or spacecraft
require robust control and fault diagnosis capabilities as these systems are potentially
subjected to unexpected anomalies and faults in actuators, input-output sensors, components,
or subsystems. Consequently, fault diagnosis capabilities and requirements for
aerospace applications have recently been receiving a great deal of attention in the research
community.
A fault diagnosis system needs to detect and isolate the presence and location of the
faults, on the basis also of the control system architectures. Development of appropriate
techniques and solutions for these tasks are known as the fault detection and isolation
(FDI) problem. Several procedures for sensor FDI applied to a nonlinear simulated model
of a commercial aircraft, in the presence of wind gust disturbances and measurement
errors, are presented in this thesis.
The main contributions of this work are related to the design and the optimisation of
two FDI schemes based on a linear polynomial method (PM) and the nonlinear geometric
approach (NLGA). In the NLGA framework, two further FDI techniques are developed;
the first one relies on adaptive filters (NLGA–AF), whilst the second one exploits particle
filters (NLGA–PF).
The suggested design approaches leads to dynamic filters, the so–called residual generators,
that achieve both disturbance decoupling and robustness properties with respect
to modelling errors and noise. Moreover, the obtained results highlight a good trade-off
between solution complexity and achieved performances.
The FDI strategies are applied to the aircraft model in flight conditions characterised
by tight–coupled longitudinal and lateral dynamics. The robustness and the reliability
properties of the residual generators related to the considered FDI techniques are investigated
and verified by simulating a general aircraft reference trajectory.
Extensive simulations exploiting the Monte–Carlo analysis tool are also used for assessing
the overall performance capabilities of the developed FDI schemes in the presence of
both measurement and modelling errors. Comparisons with other disturbance–decoupling
methods for FDI based on neural networks (NN) and unknown input Kalman filter (UIKF)
are finally reported
Gauge/gravity duality and the interplay of various fractional branes
We consider different types of fractional branes on a Z_2 orbifold of the
conifold and analyze in detail the corresponding gauge/gravity duality. The
gauge theory possesses a rich and varied dynamics, both in the UV and in the
IR. We find the dual supergravity solution which contains both untwisted and
twisted 3-form fluxes, related to what are known as deformation and N=2
fractional branes respectively. We analyze the resulting RG flow from the
supergravity perspective, by developing an algorithm to easily extract it. We
find hints of a generalization of the familiar cascade of Seiberg dualities due
to a non-trivial interplay between the different types of fractional branes. We
finally consider the IR behavior in several limits, where the dominant
effective dynamics is either confining, in a Coulomb phase or runaway, and
discuss the resolution of singularities in the dual geometric background.Comment: 38 pages + appendices, 15 figures; v2: refs added and typos correcte
Elliptic non-Abelian Donaldson-Thomas invariants of \u21023
We compute the elliptic genus of the D1/D7 brane system in flat space, finding a non-trivial dependence on the number of D7 branes, and provide an F-theory interpretation of the result. We show that the JK-residues contributing to the elliptic genus are in one-to-one correspondence with coloured plane partitions and that the elliptic genus can be written as a chiral correlator of vertex operators on the torus. We also study the quantum mechanical system describing D0/D6 bound states on a circle, which leads to a plethystic exponential formula that can be connected to the M-theory graviton index on a multi-Taub-NUT background. The formula is a conjectural expression for higher-rank equivariant K-theoretic Donaldson-Thomas invariants on \u21023
Living on the walls of super-QCD
We study BPS domain walls in four-dimensional N = 1 massive SQCD with gauge group SU (N) and F < N flavors. We propose a class of three-dimensional Chern-Simons-matter theories to describe the effective dynamics on the walls. Our proposal passes several checks, including the exact matching between its vacua and the solutions to the four-dimensional BPS domain wall equations, that we solve in the small mass regime. As the flavor mass is varied, domain walls undergo a second-order phase transition, where multiple vacua coalesce into a single one. For special values of the parameters, the phase transition exhibits supersymmetry enhancement. Our proposal includes and extends previous results in the literature, providing a complete picture of BPS domain walls for F < N massive SQCD. A similar picture holds also for SQCD with gauge group Sp (N) and F < N + 1 flavors
Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV
Sparse matrix vector multiplication (SpMV) is central to numerous
data-intensive applications, but requires streaming indirect memory accesses
that severely degrade both processing and memory throughput in state-of-the-art
architectures. Near-memory hardware units, decoupling indirect streams from
processing elements, partially alleviate the bottleneck, but rely on low DRAM
access granularity, which is highly inefficient for modern DRAM standards like
HBM and LPDDR. To fully address the end-to-end challenge, we propose a
low-overhead data coalescer combined with a near-memory indirect streaming unit
for AXI-Pack, an extension to the widespread AXI4 protocol packing narrow
irregular stream elements onto wide memory buses. Our combined solution
leverages the memory-level parallelism and coalescence of streaming indirect
accesses in irregular applications like SpMV to maximize the performance and
bandwidth efficiency attained on wide memory interfaces. Our solution delivers
an average speedup of 8x in effective indirect access, often reaching the full
memory bandwidth. As a result, we achieve an average end-to-end speedup on SpMV
of 3x. Moreover, our approach demonstrates remarkable on-chip efficiency,
requiring merely 27kB of on-chip storage and a very compact implementation area
of 0.2-0.3mm^2 in a 12nm node.Comment: 6 pages, 6 figures. Submitted to DATE 202
Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters
While parallel architectures based on clusters of Processing Elements (PEs)
sharing L1 memory are widespread, there is no consensus on how lean their PE
should be. Architecting PEs as vector processors holds the promise to greatly
reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck
(VNB). However, due to their historical association with supercomputers,
classical vector machines include micro-architectural tricks to improve the
Instruction Level Parallelism (ILP), which increases their instruction fetch
and decode energy overhead. In this paper, we explore for the first time vector
processing as an option to build small and efficient PEs for large-scale
shared-L1 clusters. We propose Spatz, a compact, modular 32-bit vector
processing unit based on the integer embedded subset of the RISC-V Vector
Extension version 1.0. A Spatz-based cluster with four Multiply-Accumulate
Units (MACUs) needs only 7.9 pJ per 32-bit integer multiply-accumulate
operation, 40% less energy than an equivalent cluster built with four Snitch
scalar cores. We analyzed Spatz' performance by integrating it within MemPool,
a large-scale many-core shared-L1 cluster. The Spatz-based MemPool system
achieves up to 285 GOPS when running a 256x256 32-bit integer matrix
multiplication, 70% more than the equivalent Snitch-based MemPool system. In
terms of energy efficiency, the Spatz-based MemPool system achieves up to 266
GOPS/W when running the same kernel, more than twice the energy efficiency of
the Snitch-based MemPool system, which reaches 128 GOPS/W. Those results show
the viability of lean vector processors as high-performance and
energy-efficient PEs for large-scale clusters with tightly-coupled L1 memory.Comment: 9 pages. Accepted for publication in the 2022 International
Conference on Computer-Aided Design (ICCAD 2022
Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV1.0 Compliant Open-Source Processor
Vector processing is highly effective in boosting processor performance and
efficiency for data-parallel workloads. In this paper, we present Ara2, the
first fully open-source vector processor to support the RISC-V V 1.0 frozen
ISA. We evaluate Ara2's performance on a diverse set of data-parallel kernels
for various problem sizes and vector-unit configurations, achieving an average
functional-unit utilization of 95% on the most computationally intensive
kernels. We pinpoint performance boosters and bottlenecks, including the scalar
core, memories, and vector architecture, providing insights into the main
vector architecture's performance drivers. Leveraging the openness of the
design, we implement Ara2 in a 22nm technology, characterize its PPA metrics on
various configurations (2-16 lanes), and analyze its microarchitecture and
implementation bottlenecks. Ara2 achieves a state-of-the-art energy efficiency
of 37.8 DP-GFLOPS/W (0.8V) and 1.35GHz of clock frequency (critical path: ~40
FO4 gates). Finally, we explore the performance and energy-efficiency
trade-offs of multi-core vector processors: we find that multiple vector cores
help overcome the scalar core issue-rate bound that limits short-vector
performance. For example, a cluster of eight 2-lane Ara2 (16 FPUs) achieves
more than 3x better performance than a 16-lane single-core Ara2 (16 FPUs) when
executing a 32x32x32 matrix multiplication, with 1.5x improved energy
efficiency
Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication
From classical HPC to deep learning, MatMul is at the heart of today's
computing. The recent Maddness method approximates MatMul without the need for
multiplication by using a hash-based version of product quantization (PQ)
indexing into a look-up table (LUT). Stella Nera is the first Maddness
accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more
than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators
implemented in the same technology. The hash function is a decision tree, which
allows for an efficient hardware implementation as the multiply-accumulate
operations are replaced by decision tree passes and LUT lookups. The entire
Maddness MatMul can be broken down into parts that allow an effective
implementation with small computing units and memories, allowing it to reach
extreme efficiency while remaining generically applicable for MatMul tasks. In
a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency
of 161 TOp/s/[email protected] with a Top-1 accuracy on CIFAR-10 of more than 92.5% using
ResNet9.Comment: 6 pages, 7 figures, preprint under revie
Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks
Neural Architecture Search (NAS) is increasingly popular to automatically
explore the accuracy versus computational complexity trade-off of Deep Learning
(DL) architectures. When targeting tiny edge devices, the main challenge for DL
deployment is matching the tight memory constraints, hence most NAS algorithms
consider model size as the complexity metric. Other methods reduce the energy
or latency of DL models by trading off accuracy and number of inference
operations. Energy and memory are rarely considered simultaneously, in
particular by low-search-cost Differentiable NAS (DNAS) solutions. We overcome
this limitation proposing the first DNAS that directly addresses the most
realistic scenario from a designer's perspective: the co-optimization of
accuracy and energy (or latency) under a memory constraint, determined by the
target HW. We do so by combining two complexity-dependent loss functions during
training, with independent strength. Testing on three edge-relevant tasks from
the MLPerf Tiny benchmark suite, we obtain rich Pareto sets of architectures in
the energy vs. accuracy space, with memory footprints constraints spanning from
75% to 6.25% of the baseline networks. When deployed on a commercial edge
device, the STM NUCLEO-H743ZI2, our networks span a range of 2.18x in energy
consumption and 4.04% in accuracy for the same memory constraint, and reduce
energy by up to 2.2x with negligible accuracy drop with respect to the
baseline.Comment: Accepted for publication at the ISLPED 2022 ACM/IEEE International
Symposium on Low Power Electronics and Desig
- …