107 research outputs found

    DESIGN AND ANALYSIS OF LINEAR AND NONLINEAR FILTERS FOR THE FDI OF AIRCRAFT MODEL SENSORS

    Get PDF
    Increasing demands on reliability for safety critical systems such as aircraft or spacecraft require robust control and fault diagnosis capabilities as these systems are potentially subjected to unexpected anomalies and faults in actuators, input-output sensors, components, or subsystems. Consequently, fault diagnosis capabilities and requirements for aerospace applications have recently been receiving a great deal of attention in the research community. A fault diagnosis system needs to detect and isolate the presence and location of the faults, on the basis also of the control system architectures. Development of appropriate techniques and solutions for these tasks are known as the fault detection and isolation (FDI) problem. Several procedures for sensor FDI applied to a nonlinear simulated model of a commercial aircraft, in the presence of wind gust disturbances and measurement errors, are presented in this thesis. The main contributions of this work are related to the design and the optimisation of two FDI schemes based on a linear polynomial method (PM) and the nonlinear geometric approach (NLGA). In the NLGA framework, two further FDI techniques are developed; the first one relies on adaptive filters (NLGA–AF), whilst the second one exploits particle filters (NLGA–PF). The suggested design approaches leads to dynamic filters, the so–called residual generators, that achieve both disturbance decoupling and robustness properties with respect to modelling errors and noise. Moreover, the obtained results highlight a good trade-off between solution complexity and achieved performances. The FDI strategies are applied to the aircraft model in flight conditions characterised by tight–coupled longitudinal and lateral dynamics. The robustness and the reliability properties of the residual generators related to the considered FDI techniques are investigated and verified by simulating a general aircraft reference trajectory. Extensive simulations exploiting the Monte–Carlo analysis tool are also used for assessing the overall performance capabilities of the developed FDI schemes in the presence of both measurement and modelling errors. Comparisons with other disturbance–decoupling methods for FDI based on neural networks (NN) and unknown input Kalman filter (UIKF) are finally reported

    Gauge/gravity duality and the interplay of various fractional branes

    Full text link
    We consider different types of fractional branes on a Z_2 orbifold of the conifold and analyze in detail the corresponding gauge/gravity duality. The gauge theory possesses a rich and varied dynamics, both in the UV and in the IR. We find the dual supergravity solution which contains both untwisted and twisted 3-form fluxes, related to what are known as deformation and N=2 fractional branes respectively. We analyze the resulting RG flow from the supergravity perspective, by developing an algorithm to easily extract it. We find hints of a generalization of the familiar cascade of Seiberg dualities due to a non-trivial interplay between the different types of fractional branes. We finally consider the IR behavior in several limits, where the dominant effective dynamics is either confining, in a Coulomb phase or runaway, and discuss the resolution of singularities in the dual geometric background.Comment: 38 pages + appendices, 15 figures; v2: refs added and typos correcte

    Elliptic non-Abelian Donaldson-Thomas invariants of \u21023

    Get PDF
    We compute the elliptic genus of the D1/D7 brane system in flat space, finding a non-trivial dependence on the number of D7 branes, and provide an F-theory interpretation of the result. We show that the JK-residues contributing to the elliptic genus are in one-to-one correspondence with coloured plane partitions and that the elliptic genus can be written as a chiral correlator of vertex operators on the torus. We also study the quantum mechanical system describing D0/D6 bound states on a circle, which leads to a plethystic exponential formula that can be connected to the M-theory graviton index on a multi-Taub-NUT background. The formula is a conjectural expression for higher-rank equivariant K-theoretic Donaldson-Thomas invariants on \u21023

    Living on the walls of super-QCD

    Get PDF
    We study BPS domain walls in four-dimensional N = 1 massive SQCD with gauge group SU (N) and F < N flavors. We propose a class of three-dimensional Chern-Simons-matter theories to describe the effective dynamics on the walls. Our proposal passes several checks, including the exact matching between its vacua and the solutions to the four-dimensional BPS domain wall equations, that we solve in the small mass regime. As the flavor mass is varied, domain walls undergo a second-order phase transition, where multiple vacua coalesce into a single one. For special values of the parameters, the phase transition exhibits supersymmetry enhancement. Our proposal includes and extends previous results in the literature, providing a complete picture of BPS domain walls for F < N massive SQCD. A similar picture holds also for SQCD with gauge group Sp (N) and F < N + 1 flavors

    Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV

    Full text link
    Sparse matrix vector multiplication (SpMV) is central to numerous data-intensive applications, but requires streaming indirect memory accesses that severely degrade both processing and memory throughput in state-of-the-art architectures. Near-memory hardware units, decoupling indirect streams from processing elements, partially alleviate the bottleneck, but rely on low DRAM access granularity, which is highly inefficient for modern DRAM standards like HBM and LPDDR. To fully address the end-to-end challenge, we propose a low-overhead data coalescer combined with a near-memory indirect streaming unit for AXI-Pack, an extension to the widespread AXI4 protocol packing narrow irregular stream elements onto wide memory buses. Our combined solution leverages the memory-level parallelism and coalescence of streaming indirect accesses in irregular applications like SpMV to maximize the performance and bandwidth efficiency attained on wide memory interfaces. Our solution delivers an average speedup of 8x in effective indirect access, often reaching the full memory bandwidth. As a result, we achieve an average end-to-end speedup on SpMV of 3x. Moreover, our approach demonstrates remarkable on-chip efficiency, requiring merely 27kB of on-chip storage and a very compact implementation area of 0.2-0.3mm^2 in a 12nm node.Comment: 6 pages, 6 figures. Submitted to DATE 202

    Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters

    Full text link
    While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck (VNB). However, due to their historical association with supercomputers, classical vector machines include micro-architectural tricks to improve the Instruction Level Parallelism (ILP), which increases their instruction fetch and decode energy overhead. In this paper, we explore for the first time vector processing as an option to build small and efficient PEs for large-scale shared-L1 clusters. We propose Spatz, a compact, modular 32-bit vector processing unit based on the integer embedded subset of the RISC-V Vector Extension version 1.0. A Spatz-based cluster with four Multiply-Accumulate Units (MACUs) needs only 7.9 pJ per 32-bit integer multiply-accumulate operation, 40% less energy than an equivalent cluster built with four Snitch scalar cores. We analyzed Spatz' performance by integrating it within MemPool, a large-scale many-core shared-L1 cluster. The Spatz-based MemPool system achieves up to 285 GOPS when running a 256x256 32-bit integer matrix multiplication, 70% more than the equivalent Snitch-based MemPool system. In terms of energy efficiency, the Spatz-based MemPool system achieves up to 266 GOPS/W when running the same kernel, more than twice the energy efficiency of the Snitch-based MemPool system, which reaches 128 GOPS/W. Those results show the viability of lean vector processors as high-performance and energy-efficient PEs for large-scale clusters with tightly-coupled L1 memory.Comment: 9 pages. Accepted for publication in the 2022 International Conference on Computer-Aided Design (ICCAD 2022

    Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV1.0 Compliant Open-Source Processor

    Full text link
    Vector processing is highly effective in boosting processor performance and efficiency for data-parallel workloads. In this paper, we present Ara2, the first fully open-source vector processor to support the RISC-V V 1.0 frozen ISA. We evaluate Ara2's performance on a diverse set of data-parallel kernels for various problem sizes and vector-unit configurations, achieving an average functional-unit utilization of 95% on the most computationally intensive kernels. We pinpoint performance boosters and bottlenecks, including the scalar core, memories, and vector architecture, providing insights into the main vector architecture's performance drivers. Leveraging the openness of the design, we implement Ara2 in a 22nm technology, characterize its PPA metrics on various configurations (2-16 lanes), and analyze its microarchitecture and implementation bottlenecks. Ara2 achieves a state-of-the-art energy efficiency of 37.8 DP-GFLOPS/W (0.8V) and 1.35GHz of clock frequency (critical path: ~40 FO4 gates). Finally, we explore the performance and energy-efficiency trade-offs of multi-core vector processors: we find that multiple vector cores help overcome the scalar core issue-rate bound that limits short-vector performance. For example, a cluster of eight 2-lane Ara2 (16 FPUs) achieves more than 3x better performance than a 16-lane single-core Ara2 (16 FPUs) when executing a 32x32x32 matrix multiplication, with 1.5x improved energy efficiency

    Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication

    Full text link
    From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (LUT). Stella Nera is the first Maddness accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators implemented in the same technology. The hash function is a decision tree, which allows for an efficient hardware implementation as the multiply-accumulate operations are replaced by decision tree passes and LUT lookups. The entire Maddness MatMul can be broken down into parts that allow an effective implementation with small computing units and memories, allowing it to reach extreme efficiency while remaining generically applicable for MatMul tasks. In a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency of 161 TOp/s/[email protected] with a Top-1 accuracy on CIFAR-10 of more than 92.5% using ResNet9.Comment: 6 pages, 7 figures, preprint under revie

    Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks

    Get PDF
    Neural Architecture Search (NAS) is increasingly popular to automatically explore the accuracy versus computational complexity trade-off of Deep Learning (DL) architectures. When targeting tiny edge devices, the main challenge for DL deployment is matching the tight memory constraints, hence most NAS algorithms consider model size as the complexity metric. Other methods reduce the energy or latency of DL models by trading off accuracy and number of inference operations. Energy and memory are rarely considered simultaneously, in particular by low-search-cost Differentiable NAS (DNAS) solutions. We overcome this limitation proposing the first DNAS that directly addresses the most realistic scenario from a designer's perspective: the co-optimization of accuracy and energy (or latency) under a memory constraint, determined by the target HW. We do so by combining two complexity-dependent loss functions during training, with independent strength. Testing on three edge-relevant tasks from the MLPerf Tiny benchmark suite, we obtain rich Pareto sets of architectures in the energy vs. accuracy space, with memory footprints constraints spanning from 75% to 6.25% of the baseline networks. When deployed on a commercial edge device, the STM NUCLEO-H743ZI2, our networks span a range of 2.18x in energy consumption and 4.04% in accuracy for the same memory constraint, and reduce energy by up to 2.2x with negligible accuracy drop with respect to the baseline.Comment: Accepted for publication at the ISLPED 2022 ACM/IEEE International Symposium on Low Power Electronics and Desig
    corecore