7,356 research outputs found
Recommended from our members
Application of Advanced Early Warning Systems with Adaptive Protection
This project developed and field-tested two methods of Adaptive Protection systems utilizing synchrophasor data. One method detects conditions of system stress that can lead to unintended relay operation, and initiates a supervisory signal to modify relay response in real time to avoid false trips. The second method detects the possibility of false trips of impedance relays as stable system swings âencroachâ on the relaysâ impedance zones, and produces an early warning so that relay engineers can re-evaluate relay settings. In addition, real-time synchrophasor data produced by this project was used to develop advanced visualization techniques for display of synchrophasor data to utility operators and engineers
Energy efficient hardware acceleration of multimedia processing tools
The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores.
To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature.
The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings
Recommended from our members
ANALOG SIGNAL PROCESSING SOLUTIONS AND DESIGN OF MEMRISTOR-CMOS ANALOG CO-PROCESSOR FOR ACCELERATION OF HIGH-PERFORMANCE COMPUTING APPLICATIONS
Emerging applications in the field of machine vision, deep learning and scientific simulation require high computational speed and are run on platforms that are size, weight and power constrained. With the transistor scaling coming to an end, existing digital hardware architectures will not be able to meet these ever-increasing demands. Analog computation with its rich set of primitives and inherent parallel architecture can be faster, more efficient and compact for some of these applications. The major contribution of this work is to show that analog processing can be a viable solution to this problem. This is demonstrated in the three parts of the dissertation.
In the first part of the dissertation, we demonstrate that analog processing can be used to solve the problem of stereo correspondence. Novel modifications to the algorithms are proposed which improves the computational speed and makes them efficiently implementable in analog hardware. The analog domain implementation provides further speedup in computation and has lower power consumption than a digital implementation.
In the second part of the dissertation, a prototype of an analog processor was developed using commercially available off-the-shelf components. The focus was on providing experimental results that demonstrate functionality and to show that the performance of the prototype for low-level and mid-level image processing tasks is equivalent to a digital implementation. To demonstrate improvement in speed and power consumption, an integrated circuit design of the analog processor was proposed, and it was shown that such an analog processor would be faster than state-of-the-art digital and other analog processors.
In the third part of the dissertation, a memristor-CMOS analog co-processor that can perform floating point vector matrix multiplication (VMM) is proposed. VMM computation underlies some of the major applications. To demonstrate the working of the analog co-processor at a system level, a new tool called PSpice Systems Option is used. It is shown that the analog co-processor has a superior performance when compared to the projected performances of digital and analog processors. Using the new tool, various application simulations for image processing and solution to partial differential equations are performed on the co-processor model
SYSTEM-ON-A-CHIP (SOC)-BASED HARDWARE ACCELERATION FOR HUMAN ACTION RECOGNITION WITH CORE COMPONENTS
Today, the implementation of machine vision algorithms on embedded platforms or in portable systems is growing rapidly due to the demand for machine vision in daily human life. Among the applications of machine vision, human action and activity recognition has become an active research area, and market demand for providing integrated smart security systems is growing rapidly. Among the available approaches, embedded vision is in the top tier; however, current embedded platforms may not be able to fully exploit the potential performance of machine vision algorithms, especially in terms of low power consumption. Complex algorithms can impose immense computation and communication demands, especially action recognition algorithms, which require various stages of preprocessing, processing and machine learning blocks that need to operate concurrently. The market demands embedded platforms that operate with a power consumption of only a few watts. Attempts have been mad to improve the performance of traditional embedded approaches by adding more powerful processors; this solution may solve the computation problem but increases the power consumption. System-on-a-chip eld-programmable gate arrays (SoC-FPGAs) have emerged as a major architecture approach for improving power eciency while increasing computational performance. In a SoC-FPGA, an embedded processor and an FPGA serving as an accelerator are fabricated in the same die to simultaneously improve power consumption and performance. Still, current SoC-FPGA-based vision implementations either shy away from supporting complex and adaptive vision algorithms or operate at very limited resolutions due to the immense communication and computation demands. The aim of this research is to develop a SoC-based hardware acceleration workflow for the realization of advanced vision algorithms. Hardware acceleration can improve performance for highly complex mathematical calculations or repeated functions. The performance of a SoC system can thus be improved by using hardware acceleration method to accelerate the element that incurs the highest performance overhead. The outcome of this research could be used for the implementation of various vision algorithms, such as face recognition, object detection or object tracking, on embedded platforms. The contributions of SoC-based hardware acceleration for hardware-software codesign platforms include the following: (1) development of frameworks for complex human action recognition in both 2D and 3D; (2) realization of a framework with four main implemented IPs, namely, foreground and background subtraction (foreground probability), human detection, 2D/3D point-of-interest detection and feature extraction, and OS-ELM as a machine learning algorithm for action identication; (3) use of an FPGA-based hardware acceleration method to resolve system bottlenecks and improve system performance; and (4) measurement and analysis of system specications, such as the acceleration factor, power consumption, and resource utilization. Experimental results show that the proposed SoC-based hardware acceleration approach provides better performance in terms of the acceleration factor, resource utilization and power consumption among all recent works. In addition, a comparison of the accuracy of the framework that runs on the proposed embedded platform (SoCFPGA) with the accuracy of other PC-based frameworks shows that the proposed approach outperforms most other approaches
FPGA acceleration of DNA sequence alignment: design analysis and optimization
Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows.
1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy.
2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software.
3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces
Protein Alignment Systolic Array Throughput Optimization
Protein comparison is gaining importance year after year since it has been demonstrated that biologists can find cor- relation between different species, or genetic mutations that can lead to cancer and genetic diseases. Protein sequence alignment is the most computational intensive task when performing protein comparison. In order to speed-up alignment, dedicated processors that can perform different computations in parallel have been designed. Among them, the best performance have been achieved using Systolic Arrays. However, when the Processing Elements of the Systolic Array have an internal loop, performance could be highly reduced. In this work we present an architectural strategy to address this problem applying pipeline interleaving; this strategy is applied to a Systolic Array for Smith Waterman algorithm that we designed. Results encourage the adoption of pipeline interleaving for parallel circuits with loop based Processing Elements. We demonstrate that important benefits in terms of higher operating frequency can be derived without so relevant costs as increased complexity, area and power required
The prospects of quantum computing in computational molecular biology
Quantum computers can in principle solve certain problems exponentially more
quickly than their classical counterparts. We have not yet reached the advent
of useful quantum computation, but when we do, it will affect nearly all
scientific disciplines. In this review, we examine how current quantum
algorithms could revolutionize computational biology and bioinformatics. There
are potential benefits across the entire field, from the ability to process
vast amounts of information and run machine learning algorithms far more
efficiently, to algorithms for quantum simulation that are poised to improve
computational calculations in drug discovery, to quantum algorithms for
optimization that may advance fields from protein structure prediction to
network analysis. However, these exciting prospects are susceptible to "hype",
and it is also important to recognize the caveats and challenges in this new
technology. Our aim is to introduce the promise and limitations of emerging
quantum computing technologies in the areas of computational molecular biology
and bioinformatics.Comment: 23 pages, 3 figure
AirSync: Enabling Distributed Multiuser MIMO with Full Spatial Multiplexing
The enormous success of advanced wireless devices is pushing the demand for
higher wireless data rates. Denser spectrum reuse through the deployment of
more access points per square mile has the potential to successfully meet the
increasing demand for more bandwidth. In theory, the best approach to density
increase is via distributed multiuser MIMO, where several access points are
connected to a central server and operate as a large distributed multi-antenna
access point, ensuring that all transmitted signal power serves the purpose of
data transmission, rather than creating "interference." In practice, while
enterprise networks offer a natural setup in which distributed MIMO might be
possible, there are serious implementation difficulties, the primary one being
the need to eliminate phase and timing offsets between the jointly coordinated
access points.
In this paper we propose AirSync, a novel scheme which provides not only time
but also phase synchronization, thus enabling distributed MIMO with full
spatial multiplexing gains. AirSync locks the phase of all access points using
a common reference broadcasted over the air in conjunction with a Kalman filter
which closely tracks the phase drift. We have implemented AirSync as a digital
circuit in the FPGA of the WARP radio platform. Our experimental testbed,
comprised of two access points and two clients, shows that AirSync is able to
achieve phase synchronization within a few degrees, and allows the system to
nearly achieve the theoretical optimal multiplexing gain. We also discuss MAC
and higher layer aspects of a practical deployment. To the best of our
knowledge, AirSync offers the first ever realization of the full multiuser MIMO
gain, namely the ability to increase the number of wireless clients linearly
with the number of jointly coordinated access points, without reducing the per
client rate.Comment: Submitted to Transactions on Networkin
- âŠ