30 research outputs found

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    AN ARCHITECTURAL APPROACH FOR REDUCINGPOWER AND INCREASING SECURITY OF RFID TAGS

    Get PDF
    Radio Frequency Identification (RFID) technology is currently employed for a variety of applications such as RFID-based wireless payment, healthcare, homeland security, asset management,etc. Due to newer privacy requirements and increasingly secure applications, typical RFID tags are required to expand security features such as data encryption and safe transactions. However, RFID tags have extremely strict low-power consumption requirements. Thus, reduced power consumption and secure data transactions are two main problems for the next generation RFID tags.This dissertation presents an architectural approach to address these two main problems.This dissertation provides a multi-domain solution to improve the power consumption andsecurity, while also reducing design time and verification time of the system. In particular, Idescribe (1)a smart buffering technique to allow a tag to remain in a standby mode until addressed,(2)a multi-layer, low-power technique that transcends the passive-transaction, physical, and data layers to provide secure transactions, (3) an FPGA-based traffic profiler system to generate traces of RFID communications for both tag verification and power analysis without the need of actual hardware, and (4) a design automation technique to create physical layer encoding and decoding blocks in hardware suitable for RFID tags.This dissertation presents four contributions: (1) As a result, based on a Markov Process energymodel, the smart buffering technique is shown to reduce power consumption by 85% over a traditionalactive tag; (2) The multi-layer, low-power security technique provides protection againstmalicious reader attacks to disable the tag, to steal the information stored in or communicatedto the device. The power consumption overhead for implementing these layers of security is increased approximately 13% over the basic tag controller; (3) In addition, the FPGA-based traffic profiler system has been able to generate traces for ISO 18000 part 6C (EPC Gen2) protocol; and (4) The designs of endocing/decoding blocks are generated automatically by the Physical LayerSynthesis tool for five protocols used in or related to RFID. Consequently, any power consumption of five designs is less than 5 ÂŁgW. Furthermore, compared with five designs implemented by hand, the difference of the power consumption between two of them is less than 7% at most

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Preliminary multicore architecture for Introspective Computing

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 243-245).This thesis creates a framework for Introspective Computing. Introspective Computing is a computing paradigm characterized by self-aware software. Self-aware software systems use hardware mechanisms to observe an application's execution so that they may adapt execution to improve performance, reduce power consumption, or balance user-defined fitness criteria over time-varying conditions in a system environment. We dub our framework Partner Cores. The Partner Cores framework builds upon tiled multicore architectures [11, 10, 25, 9], closely coupling cores such that one may be used to observe and optimize execution in another. Partner cores incrementally collect and analyze execution traces from code cores then exploit knowledge of the hardware to optimize execution. This thesis develops a tiled architecture for the Partner Cores framework that we dub Evolve. Evolve provides a versatile substrate upon which software may coordinate core partnerships and various forms of parallelism. To do so, Evolve augments a basic tiled architecture with introspection hardware and programmable functional units. Partner Cores software systems on the Evolve hardware may follow the style of helper threading [13, 12, 6] or utilize the programmable functional units in each core to evolve application-specific coprocessor engines. This thesis work develops two Partner Cores software systems: the Dynamic Partner-Assisted Branch Predictor and the Introspective L2 Memory System (IL2). The branch predictor employs a partner core as a coprocessor engine for general dynamic branch prediction in a corresponding code core. The IL2 retasks the silicon resources of partner cores as banks of an on-chip, distributed, software L2 cache for code cores.(cont.) The IL2 employs aggressive, application-specific prefetchers for minimizing cache miss penalties and DRAM power consumption. Our results and future work show that the branch predictor is able to sustain prediction for code core branch frequencies as high as one every 7 instructions with no degradation in accuracy; updated prediction directions are available in a low minimum of 20-21 instructions. For the IL2, we develop a pixel block prefetcher for the image data structure used in a JPEG encoder benchmark and show that a 50% improvement in absolute performance is attainable.by Jonathan M. Eastep.S.M

    VIRTUAL MEMORY ON A MANY-CORE NOC

    Get PDF
    Many-core devices are likely to become increasingly common in real-time and embedded systems as computational demands grow and as expectations for higher performance can generally only be met by by increasing core numbers rather than relying on higher clock speeds. Network-on-chip devices, where multiple cores share a single slice of silicon and employ packetised communications, are a widely-deployed many-core option for system designers. As NoCs are expected to run larger and more complex programs, the small amount of fast, on-chip memory available to each core is unlikely to be sufficient for all but the simplest of tasks, and it is necessary to find an efficient, effective, and time-bounded, means of accessing resources stored in off-chip memory, such as DRAM or Flash storage. The abstraction of paged virtual memory is a familiar technique to manage similar tasks in general computing but has often been shunned by real-time developers because of concern about time predictability. We show it can be a poor choice for a many-core NoC system as, unmodified, it typically uses page sizes optimised for interaction with spinning disks and not solid state media, and transports significant volumes of subsequently unused data across already congested links. In this work we outline and simulate an efficient partial paging algorithm where only those memory resources that are locally accessed are transported between global and local storage. We further show that smaller page sizes add to efficiency. We examine the factors that lead to timing delays in such systems, and show we can predict worst case execution times at even safety-critical thresholds by using statistical methods from extreme value theory. We also show these results are applicable to systems with a variety of connections to memory

    Achieving network resiliency using sound theoretical and practical methods

    Get PDF
    Computer networks have revolutionized the life of every citizen in our modern intercon- nected society. The impact of networked systems spans every aspect of our lives, from financial transactions to healthcare and critical services, making these systems an attractive target for malicious entities that aim to make financial or political profit. Specifically, the past decade has witnessed an astounding increase in the number and complexity of sophisti- cated and targeted attacks, known as advanced persistent threats (APT). Those attacks led to a paradigm shift in the security and reliability communities’ perspective on system design; researchers and government agencies accepted the inevitability of incidents and malicious attacks, and marshaled their efforts into the design of resilient systems. Rather than focusing solely on preventing failures and attacks, resilient systems are able to maintain an acceptable level of operation in the presence of such incidents, and then recover gracefully into normal operation. Alongside prevention, resilient system design focuses on incident detection as well as timely response. Unfortunately, the resiliency efforts of research and industry experts have been hindered by an apparent schism between theory and practice, which allows attackers to maintain the upper hand advantage. This lack of compatibility between the theory and practice of system design is attributed to the following challenges. First, theoreticians often make impractical and unjustifiable assumptions that allow for mathematical tractability while sacrificing accuracy. Second, the security and reliability communities often lack clear definitions of success criteria when comparing different system models and designs. Third, system designers often make implicit or unstated assumptions to favor practicality and ease of design. Finally, resilient systems are tested in private and isolated environments where validation and reproducibility of the results are not publicly accessible. In this thesis, we set about showing that the proper synergy between theoretical anal- ysis and practical design can enhance the resiliency of networked systems. We illustrate the benefits of this synergy by presenting resiliency approaches that target the inter- and intra-networking levels. At the inter-networking level, we present CPuzzle as a means to protect the transport control protocol (TCP) connection establishment channel from state- exhaustion distributed denial of service attacks (DDoS). CPuzzle leverages client puzzles to limit the rate at which misbehaving users can establish TCP connections. We modeled the problem of determining the puzzle difficulty as a Stackleberg game and solve for the equilibrium strategy that balances the users’ utilizes against CPuzzle’s resilience capabilities. Furthermore, to handle volumetric DDoS attacks, we extend CPuzzle and implement Midgard, a cooperative approach that involves end-users in the process of tolerating and neutralizing DDoS attacks. Midgard is a middlebox that resides at the edge of an Internet service provider’s network and uses client puzzles at the IP level to allocate bandwidth to its users. At the intra-networking level, we present sShield, a game-theoretic network response engine that manipulates a network’s connectivity in response to an attacker who is moving laterally to compromise a high-value asset. To implement such decision making algorithms, we leverage the recent advances in software-defined networking (SDN) to collect logs and security alerts about the network and implement response actions. However, the programma- bility offered by SDN comes with an increased chance for design-time bugs that can have drastic consequences on the reliability and security of a networked system. We therefore introduce BiFrost, an open-source tool that aims to verify safety and security proper- ties about data-plane programs. BiFrost translates data-plane programs into functionally equivalent sequential circuits, and then uses well-established hardware reduction, abstrac- tion, and verification techniques to establish correctness proofs about data-plane programs. By focusing on those four key efforts, CPuzzle, Midgard, sShield, and BiFrost, we believe that this work illustrates the benefits that the synergy between theory and practice can bring into the world of resilient system design. This thesis is an attempt to pave the way for further cooperation and coordination between theoreticians and practitioners, in the hope of designing resilient networked systems

    Research and development for the data, trigger and control card in preparation for Hi-Lumi lhc

    Get PDF
    When the Large Hadron Collider (LHC) increases its luminosity by an order of magnitude in the coming decade, the experiments that sit upon it must also be upgraded to continue to their physics performance in the increasingly demanding environment. To achieve this, the Compact Muon Solenoid (CMS) experiment will make use of tracking information in the Level-1 trigger for the first time, meaning that track reconstruction must be achieved in less than 4 ÎŒs in an all-FPGA architecture. MUonE is an experiment aiming to make an accurate measurement of the the hadronic contribution to the anomalous magnetic moment of the muon. It will achieve this by making use of similar apparatus to that designed for CMS and benefit from the research and development efforts there. This thesis presents both development and testing work for the readout chain from tracker module to back-end processing card, as well as the results and analysis of a beam test used to validate this chain for both CMS and the MUonE experiment.Open Acces

    Vector-thread architecture and implementation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 181-186).This thesis proposes vector-thread architectures as a performance-efficient solution for all-purpose computing. The VT architectural paradigm unifies the vector and multithreaded compute models. VT provides the programmer with a control processor and a vector of virtual processors. The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow. A seamless intermixing of the vector and threaded control mechanisms allows a VT architecture to flexibly and compactly encode application parallelism and locality. VT architectures can efficiently exploit a wide variety of loop-level parallelism, including non-vectorizable loops with cross-iteration dependencies or internal control flow. The Scale VT architecture is an instantiation of the vector-thread paradigm designed for low-power and high-performance embedded systems. Scale includes a scalar RISC control processor and a four-lane vector-thread unit that can execute 16 operations per cycle and supports up to 128 simultaneously active virtual processor threads. Scale provides unit-stride and strided-segment vector loads and stores, and it implements cache refill/access decoupling. The Scale memory system includes a four-port, non-blocking, 32-way set-associative, 32 KB cache. A prototype Scale VT processor was implemented in 180 nm technology using an ASIC-style design flow. The chip has 7.1 million transistors and a core area of 16.6 mm2, and it runs at 260 MHz while consuming 0.4-1.1 W. This thesis evaluates Scale using a diverse selection of embedded benchmarks, including example kernels for image processing, audio processing, text and data processing, cryptography, network processing, and wireless communication.(cont.) Larger applications also include a JPEG image encoder and an IEEE 802.11 la wireless transmitter. Scale achieves high performance on a range of different types of codes, generally executing 3-11 compute operations per cycle. Unlike other architectures which improve performance at the expense of increased energy consumption, Scale is generally even more energy efficient than a scalar RISC processor.by Ronny Meir Krashinsky.Ph.D

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link
    corecore