25 research outputs found

    Automatic adaptive multi-point moment matching for descriptor system model order reduction

    Get PDF
    We propose a novel automatic adaptive multi-point moment matching algorithm for model order reduction (MOR) of descriptor systems. The algorithm implements both adaptive frequency expansion point selection and automatic moment order control via a transfer function based error metric. Without a priori information of the system response, the proposed algorithm guarantees a much higher global accuracy compared with standard multi-point moment matching without adaptation. The moments are computed via a generalized Sylvester equation which is subsequently solved by a newly proposed generalized alternating direction implicit (GADI) method. Numerical examples then confirm the efficacy of the proposed schemes. © 2013 IEEE.published_or_final_versio

    Full-System Simulation of Mobile CPU/GPU Platforms

    Get PDF
    Graphics Processing Units (GPUs) critically rely on a complex system software stack comprising kernel- and userspace drivers and Just-in-time (JIT) compilers. Yet, existing GPU simulators typically abstract away details of the software stack and GPU instruction set. Partly, this is because GPU vendors rarely release sufficient information about their latest GPU products. However, this is also due to the lack of an integrated CPU/GPU simulation framework, which is complete and powerful enough to drive the complex GPU software environment. This has led to a situation where research on GPU architectures and compilers is largely based on outdated or greatly simplified architectures and software stacks, undermining the validity of the generated results. In this paper we develop a full-system system simulation environment for a mobile platform, which enables users to run a complete and unmodified software stack for a state-of-the-art mobile Arm CPU and Mali-G71 GPU powered device. We validate our simulator against a hardware implementation and Arm’s stand-alone GPU simulator, achieving 100% architectural accuracy across all available toolchains. We demonstrate the capability of our GPU simulation framework by optimizing an advanced Computer Vision application using simulated statistics unavailable with other simulation approaches or physical GPU implementations. We demonstrate that performance optimizations for desktop GPUs trigger bottlenecks on mobile GPUs, and show the importance of efficient memory use.Postprin

    Fast Obstacle Detection System for UAS Based on Complementary Use of Radar and Stereoscopic Camera

    Get PDF
    Autonomous unmanned aerial systems (UAS) are having an increasing impact in the scientific community. One of the most challenging problems in this research area is the design of robust real-time obstacle detection and avoidance systems. In the automotive field, applications of obstacle detection systems combining radar and vision sensors are common and widely documented. However, these technologies are not currently employed in the UAS field due to the major complexity of the flight scenario, especially in urban environments. In this paper, a real-time obstacle-detection system based on the use of a 77 GHz radar and a stereoscopic camera is proposed for use in small UASs. The resulting system is capable of detecting obstacles in a broad spectrum of environmental conditions. In particular, the vision system guarantees a high resolution for short distances, while the radar has a lower resolution but can cover greater distances, being insensitive to poor lighting conditions. The developed hardware and software architecture and the related obstacle-detection algorithm are illustrated within the European project AURORA. Experimental results carried out employing a small UAS show the effectiveness of the obstacle detection system and of a simple avoidance strategy during several autonomous missions on a test site

    Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing

    Get PDF
    How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one

    Revisiting the high-performance reconfigurable computing for future datacenters

    Get PDF
    Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects-discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well

    An Ultra-Low-Power RFID/NFC Frontend IC Using 0.18 μm CMOS Technology for Passive Tag Applications

    Get PDF
    Battery-less passive sensor tags based on RFID or NFC technology have achieved much popularity in recent times. Passive tags are widely used for various applications like inventory control or in biotelemetry. In this paper, we present a new RFID/NFC frontend IC (integrated circuit) for 13.56 MHz passive tag applications. The design of the frontend IC is compatible with the standard ISO 15693/NFC 5. The paper discusses the analog design part in details with a brief overview of the digital interface and some of the critical measured parameters. A novel approach is adopted for the demodulator design, to demodulate the 10% ASK (amplitude shift keying) signal. The demodulator circuit consists of a comparator designed with a preset offset voltage. The comparator circuit design is discussed in detail. The power consumption of the bandgap reference circuit is used as the load for the envelope detection of the ASK modulated signal. The sub-threshold operation and low-supply-voltage are used extensively in the analog design—to keep the power consumption low. The IC was fabricated using 0.18 μ m CMOS technology in a die area of 1.5 mm × 1.5 mm and an effective area of 0.7 m m 2 . The minimum supply voltage desired is 1.2 V, for which the total power consumption is 107 μ W. The analog part of the design consumes only 36 μ W, which is low in comparison to other contemporary passive tags ICs. Eventually, a passive tag is developed using the frontend IC, a microcontroller, a temperature and a pressure sensor. A smart NFC device is used to readout the sensor data from the tag employing an Android-based application software. The measurement results demonstrate the full passive operational capability. The IC is suitable for low-power and low-cost industrial or biomedical battery-less sensor applications. A figure-of-merit (FOM) is proposed in this paper which is taken as a reference for comparison with other related state-of-the-art researches

    Design of a Programmable Passive SoC for Biomedical Applications Using RFID ISO 15693/NFC5 Interface

    Get PDF
    Low power, low cost inductively powered passive biotelemetry system involving fully customized RFID/NFC interface base SoC has gained popularity in the last decades. However, most of the SoCs developed are application specific and lacks either on-chip computational or sensor readout capability. In this paper, we present design details of a programmable passive SoC in compliance with ISO 15693/NFC5 standard for biomedical applications. The integrated system consists of a 32-bit microcontroller, a sensor readout circuit, a 12-bit SAR type ADC, 16 kB RAM, 16 kB ROM and other digital peripherals. The design is implemented in a 0.18 μ m CMOS technology and used a die area of 1.52 mm × 3.24 mm. The simulated maximum power consumption of the analog block is 592 μ W. The number of external components required by the SoC is limited to an external memory device, sensors, antenna and some passive components. The external memory device contains the application specific firmware. Based on the application, the firmware can be modified accordingly. The SoC design is suitable for medical implants to measure physiological parameters like temperature, pressure or ECG. As an application example, the authors have proposed a bioimplant to measure arterial blood pressure for patients suffering from Peripheral Artery Disease (PAD)

    Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition

    Get PDF
    Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation
    corecore