32 research outputs found

    Exploiting primitive grouping constraints for noise robust automatic speech recognition : studies with simultaneous speech.

    Get PDF
    Significant strides have been made in the field of automatic speech recognition over the past three decades. However, the systems are not robust; their performance degrades in the presence of even moderate amounts of noise. This thesis presents an approach to developing a speech recognition system that takes inspiration firom the approach of human speech recognition

    A Structured Design Methodology for High Performance VLSI Arrays

    Get PDF
    abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor.Dissertation/ThesisPh.D. Electrical Engineering 201

    Calcul sur architecture non fiable

    Get PDF
    Although materials could be fabricated as error-free theoretically with a huge cost for worst-case design methodologies, the circuit is still susceptible to transient faults by the effects of radiation, temperature sensitivity, and etc. On the contrary, an error-resilient design enables the manufacturing process to be relieved from the variability issue so as to save material cost. Since variability and transient upsets are worsening as emerging fabrication process and size shrink are tending intense, the requirement of robust design is imminent. This thesis addresses the issue of designing on unreliable circuit. The main contributions are fourfold. Firstly a fast error-correction and low cost redundancy fault-tolerant method is presented. Moreover, we introduce judicious two-dimensional criteria to estimate the reliability and the hardware efïŹciency of a circuit. A general-purpose model offers low-redundancy error-resilience for contemporary logic systems as well as future nanoeletronic architectures. At last, a decoder against internal transient faults is designed in this work.En thĂ©orie, les circuits Ă©lectroniques conçus selon la mĂ©thode du pire-cas sont supposĂ©s garantir un fonctionnement sans erreur pourun coĂ»t d’implĂ©mentation Ă©levĂ©. Dans la pratique les circuits restent sujets aux erreurs transitoires du fait de leur sensibilitĂ© aux alĂ©astels que la radiation et la tempĂ©rature. En revanche, une conception prenant en compte la tolĂ©rance aux fautes permet de faire face Ă  detels alĂ©as comme la variabilitĂ© du processus de fabrication. De plus, les erreurs transitoires et la variabilitĂ© de fabrication s’intensiïŹentavec l’émergence de nouveaux processus de fabrication et des circuits de dimension de plus en plus rĂ©duite. La demande d’une conceptionintĂ©grant la tolĂ©rance aux fautes devient dĂ©sormais primordiale. La prĂ©sente thĂšse a pour objectif de cerner la problĂ©matique de laconception de circuits sur des puces peu ïŹables et apporte des contributions suivant quatre aspects. Dans un premier temps, nous proposonsune mĂ©thode de tolĂ©rance aux fautes, basĂ©e sur la correction d’erreurs et la redondance Ă  faible coĂ»t. Puis, nous prĂ©sentonsun critĂšre bidimensionnel judicieux permettant d’évaluer la ïŹabilitĂ© et l’efïŹcacitĂ© matĂ©rielle de circuits. Nous proposons ensuite un modĂšleuniversel qui apporte une tolĂ©rance avec fautes Ă  redondance faible pour les systĂšmes logiques d’aujourd’hui et les architecturesnanoĂ©lectroniques de demain. EnïŹn, nous dĂ©couvrons un dĂ©codeur tolĂ©rant aux fautes transitoires internes

    Time Series Analysis and Classification with State-Space Models for Industrial Processes and the Life Sciences

    Get PDF
    In this thesis the use of state-space models for analysis and classification of time series data, gathered from industrial manufacturing processes and the life sciences, is investigated. To overcome hitherto unsolved problems in both application domains the temporal behavior of the data is captured using state-space models. Industrial laser welding processes are monitored with a high speed camera and the appearance of unusual events in the image sequences correlates with errors on the produced part. Thus, novel classification frameworks are developed to robustly detect these unusual events with a small false positive rate. For classifier learning, class labels are by default only available for the complete image sequence, since scanning the sequences for anomalies is expensive. The first framework combines appearance based features and state-space models for the unusual event detection in image sequences. For the first time, ideas adapted from face recognition are used for the automatic dimension reduction of images recorded from laser welding processes. The state-space model is trained incrementally and can learn from erroneous sequences without the need of manually labeling the position of the error event within sequences. %The limitation to weakly labeled data helps to reduce the labeling effort. In addition, a second framework for the object-based detection of sputter events in laser welding processes is developed. The framework successfully combines for the first time temporal change detection, object tracking and trajectory classification for the detection of weak sputter events. %This is the first time that object tracking is successfully applied to automatic sputter detection. For the application in the life sciences the improvement and further development of data analysis methods for Single Molecule Fluorescence Spectroscopy (SMFS) is considered. SMFS experiments allow to study biochemical processes on a single molecule basis. The single molecule is excited with a laser and the photons which are emitted thereon by fluorescence contain important information about conformational changes of the molecule. Advanced statistical analysis techniques are necessary to infer state changes of the molecule from changes in the photon emissions. By using state-space models, it is possible to extract information from recorded photon streams which would be lost with traditional analysis techniques

    Enhanced coding, clock recovery and detection for a magnetic credit card

    Get PDF
    Merged with duplicate record 10026.1/2299 on 03.04.2017 by CS (TIS)This thesis describes the background, investigation and construction of a system for storing data on the magnetic stripe of a standard three-inch plastic credit in: inch card. Investigation shows that the information storage limit within a 3.375 in by 0.11 in rectangle of the stripe is bounded to about 20 kBytes. Practical issues limit the data storage to around 300 Bytes with a low raw error rate: a four-fold density increase over the standard. Removal of the timing jitter (that is prob-' ably caused by the magnetic medium particle size) would increase the limit to 1500 Bytes with no other system changes. This is enough capacity for either a small digital passport photograph or a digitized signature: making it possible to remove printed versions from the surface of the card. To achieve even these modest gains has required the development of a new variable rate code that is more resilient to timing errors than other codes in its efficiency class. The tabulation of the effects of timing errors required the construction of a new code metric and self-recovering decoders. In addition, a new method of timing recovery, based on the signal 'snatches' has been invented to increase the rapidity with which a Bayesian decoder can track the changing velocity of a hand-swiped card. The timing recovery and Bayesian detector have been integrated into one computation (software) unit that is self-contained and can decode a general class of (d, k) constrained codes. Additionally, the unit has a signal truncation mechanism to alleviate some of the effects of non-linear distortion that are present when a magnetic card is read with a magneto-resistive magnetic sensor that has been driven beyond its bias magnetization. While the storage density is low and the total storage capacity is meagre in comparison with contemporary storage devices, the high density card may still have a niche role to play in society. Nevertheless, in the face of the Smart card its long term outlook is uncertain. However, several areas of coding and detection under short-duration extreme conditions have brought new decoding methods to light. The scope of these methods is not limited just to the credit card

    Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems

    Full text link
    With the increasing digital services demand, performance and power-efficiency become vital requirements for digital circuits and systems. However, the enabling CMOS technology scaling has been facing significant challenges of device uncertainties, such as process, voltage, and temperature variations. To ensure system reliability, worst-case corner assumptions are usually made in each design level. However, the over-pessimistic worst-case margin leads to unnecessary power waste and performance loss as high as 2.2x. Since optimizations are traditionally confined to each specific level, those safe margins can hardly be properly exploited. To tackle the challenge, it is therefore advised in this Ph.D. thesis to perform a cross-layer optimization for digital signal processing circuits and systems, to achieve a global balance of power consumption and output quality. To conclude, the traditional over-pessimistic worst-case approach leads to huge power waste. In contrast, the adaptive voltage scaling approach saves power (25% for the CORDIC application) by providing a just-needed supply voltage. The power saving is maximized (46% for CORDIC) when a more aggressive voltage over-scaling scheme is applied. These sparsely occurred circuit errors produced by aggressive voltage over-scaling are mitigated by higher level error resilient designs. For functions like FFT and CORDIC, smart error mitigation schemes were proposed to enhance reliability (soft-errors and timing-errors, respectively). Applications like Massive MIMO systems are robust against lower level errors, thanks to the intrinsically redundant antennas. This property makes it applicable to embrace digital hardware that trades quality for power savings.Comment: 190 page

    Processing of Ex-Situ Acquired Signals from Magnetic Disks

    Get PDF
    The ubiquity and high performance of hard disk drives for nonvolatile digital data storage cannot be denied. As the magnetic recording industry continues to develop new techniques for increasing storage density and reducing cost per bit, diagnostic and forensic tools for characterizing and interpreting the magnetic patterns recorded onto disk drive media become increasingly important. Therefore, this dissertation presents developments to the uniquely suitable spin-stand-based method of imaging magnetization patterns on media extracted from commercial hard disk drives. The emphasis of the presented research is placed on the following three areas: microscopy enhancement techniques for longitudinal magnetic recording media, "drive-independent" characterization and reconstruction of disk data, and the exploration of spin-stand microscopy in the novel context of perpendicular magnetic recording. First, it is known that, while the spin-stand microscopy technique offers high-speed and massive scale imaging capabilities, the images obtained are corrupted by distortion due to the non-local sensing or finite spatial resolution of the imaging sensor. Two techniques for mitigating this distortion, one based on characterizing the head by means of its linear response function, and a new method based on spatial Hilbert transforms, are described and demonstrated. Furthermore, a two-dimensional extension of the Hilbert transform in the context of magnetic recording is derived based on physical arguments and its application to spin-stand imaging is demonstrated. Second, although magnetic media imaging is interesting in its own right, an extension of this capability is the identification of commercial hard disk drive write channels and the subsequent reconstruction of the data written to the associated disks in a "drive-independent" manner on the spin-stand. For fundamental and practical reasons, a multilayered encoding process is performed on digital data before it is written to the disk; the presented work details the theoretical and experimental results obtained in characterizing and reversing these codes. Finally, because perpendicular recording technology has recently come on the market in consumer disk drives, the spin-stand microscopy technique is extended to imaging the media employing this new mode of recording. In particular, the novel aspects of perpendicular recording are discussed and their impact on spin-stand microscopy is demonstrated

    Reconfigurable Antenna Systems: Platform implementation and low-power matters

    Get PDF
    Antennas are a necessary and often critical component of all wireless systems, of which they share the ever-increasing complexity and the challenges of present and emerging trends. 5G, massive low-orbit satellite architectures (e.g. OneWeb), industry 4.0, Internet of Things (IoT), satcom on-the-move, Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles, all call for highly flexible systems, and antenna reconfigurability is an enabling part of these advances. The terminal segment is particularly crucial in this sense, encompassing both very compact antennas or low-profile antennas, all with various adaptability/reconfigurability requirements. This thesis work has dealt with hardware implementation issues of Radio Frequency (RF) antenna reconfigurability, and in particular with low-power General Purpose Platforms (GPP); the work has encompassed Software Defined Radio (SDR) implementation, as well as embedded low-power platforms (in particular on STM32 Nucleo family of micro-controller). The hardware-software platform work has been complemented with design and fabrication of reconfigurable antennas in standard technology, and the resulting systems tested. The selected antenna technology was antenna array with continuously steerable beam, controlled by voltage-driven phase shifting circuits. Applications included notably Wireless Sensor Network (WSN) deployed in the Italian scientific mission in Antarctica, in a traffic-monitoring case study (EU H2020 project), and into an innovative Global Navigation Satellite Systems (GNSS) antenna concept (patent application submitted). The SDR implementation focused on a low-cost and low-power Software-defined radio open-source platform with IEEE 802.11 a/g/p wireless communication capability. In a second embodiment, the flexibility of the SDR paradigm has been traded off to avoid the power consumption associated to the relevant operating system. Application field of reconfigurable antenna is, however, not limited to a better management of the energy consumption. The analysis has also been extended to satellites positioning application. A novel beamforming method has presented demonstrating improvements in the quality of signals received from satellites. Regarding those who deal with positioning algorithms, this advancement help improving precision on the estimated position

    On the Design of Future Communication Systems with Coded Transport, Storage, and Computing

    Get PDF
    Communication systems are experiencing a fundamental change. There are novel applications that require an increased performance not only of throughput but also latency, reliability, security, and heterogeneity support from these systems. To fulfil the requirements, future systems understand communication not only as the transport of bits but also as their storage, processing, and relation. In these systems, every network node has transport storage and computing resources that the network operator and its users can exploit through virtualisation and softwarisation of the resources. It is within this context that this work presents its results. We proposed distributed coded approaches to improve communication systems. Our results improve the reliability and latency performance of the transport of information. They also increase the reliability, flexibility, and throughput of storage applications. Furthermore, based on the lessons that coded approaches improve the transport and storage performance of communication systems, we propose a distributed coded approach for the computing of novel in-network applications such as the steering and control of cyber-physical systems. Our proposed approach can increase the reliability and latency performance of distributed in-network computing in the presence of errors, erasures, and attackers

    Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference

    Full text link
    Over the past decade, Deep Learning (DL) and Deep Neural Network (DNN) have gone through a rapid development. They are now vastly applied to various applications and have profoundly changed the life of hu- man beings. As an essential element of DNN, Recurrent Neural Networks (RNN) are helpful in processing time-sequential data and are widely used in applications such as speech recognition and machine translation. RNNs are difficult to compute because of their massive arithmetic operations and large memory footprint. RNN inference workloads used to be executed on conventional general-purpose processors including Central Processing Units (CPU) and Graphics Processing Units (GPU); however, they have un- necessary hardware blocks for RNN computation such as branch predictor, caching system, making them not optimal for RNN processing. To accelerate RNN computations and outperform the performance of conventional processors, previous work focused on optimization methods on both software and hardware. On the software side, previous works mainly used model compression to reduce the memory footprint and the arithmetic operations of RNNs. On the hardware side, previous works also designed domain-specific hardware accelerators based on Field Pro- grammable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC) with customized hardware pipelines optimized for efficient pro- cessing of RNNs. By following this software-hardware co-design strategy, previous works achieved at least 10X speedup over conventional processors. Many previous works focused on achieving high throughput with a large batch of input streams. However, in real-time applications, such as gaming Artificial Intellegence (AI), dynamical system control, low latency is more critical. Moreover, there is a trend of offloading neural network workloads to edge devices to provide a better user experience and privacy protection. Edge devices, such as mobile phones and wearable devices, are usually resource-constrained with a tight power budget. They require RNN hard- ware that is more energy-efficient to realize both low-latency inference and long battery life. Brain neurons have sparsity in both the spatial domain and time domain. Inspired by this human nature, previous work mainly explored model compression to induce spatial sparsity in RNNs. The delta network algorithm alternatively induces temporal sparsity in RNNs and can save over 10X arithmetic operations in RNNs proven by previous works. In this work, we have proposed customized hardware accelerators to exploit temporal sparsity in Gated Recurrent Unit (GRU)-RNNs and Long Short-Term Memory (LSTM)-RNNs to achieve energy-efficient real-time RNN inference. First, we have proposed DeltaRNN, the first-ever RNN accelerator to exploit temporal sparsity in GRU-RNNs. DeltaRNN has achieved 1.2 TOp/s effective throughput with a batch size of 1, which is 15X higher than its related works. Second, we have designed EdgeDRNN to accelerate GRU-RNN edge inference. Compared to DeltaRNN, EdgeDRNN does not rely on on-chip memory to store RNN weights and focuses on reducing off-chip Dynamic Random Access Memory (DRAM) data traffic using a more scalable architecture. EdgeDRNN have realized real-time inference of large GRU-RNNs with submillisecond latency and only 2.3 W wall plug power consumption, achieving 4X higher energy efficiency than commercial edge AI platforms like NVIDIA Jetson Nano. Third, we have used DeltaRNN to realize the first-ever continuous speech recognition sys- tem with the Dynamic Audio Sensor (DAS) as the front-end. The DAS is a neuromorphic event-driven sensor that produces a stream of asyn- chronous events instead of audio data sampled at a fixed sample rate. We have also showcased how an RNN accelerator can be integrated with an event-driven sensor on the same chip to realize ultra-low-power Keyword Spotting (KWS) on the extreme edge. Fourth, we have used EdgeDRNN to control a powered robotic prosthesis using an RNN controller to replace a conventional proportional–derivative (PD) controller. EdgeDRNN has achieved 21 ÎŒs latency of running the RNN controller and could maintain stable control of the prosthesis. We have used DeltaRNN and EdgeDRNN to solve these problems to prove their value in solving real-world problems. Finally, we have applied the delta network algorithm on LSTM-RNNs and have combined it with a customized structured pruning method, called Column-Balanced Targeted Dropout (CBTD), to induce spatio-temporal sparsity in LSTM-RNNs. Then, we have proposed another FPGA-based accelerator called Spartus, the first RNN accelerator that exploits spatio- temporal sparsity. Spartus achieved 9.4 TOp/s effective throughput with a batch size of 1, the highest among present FPGA-based RNN accelerators with a power budget around 10 W. Spartus can complete the inference of an LSTM layer having 5 million parameters within 1 ÎŒs
    corecore