151 research outputs found
Low power CMOS vision sensor for foreground segmentation
This thesis focuses on the design of a top-ranked algorithm for background
subtraction, the Pixel Adaptive Based Segmenter (PBAS), for its mapping onto a CMOS vision sensor on the focal
plane processing. The redesign of PBAS into its hardware oriented version, HO-PBAS, has led to a less number of
memories per pixel, along with a simpler overall model, yet, resulting in an acceptable loss of accuracy with respect
to its counterpart on CPU. This thesis features two CMOS vision sensors. The first one, HOPBAS1K, has laid out a
24 x 56 pixel array onto a miniasic chip in standard 180 nm CMOS technology. The second one, HOPBAS10K,
features an array of 98 x 98 pixels in standard 180 nm CMOS technology too. The second chip fixes some issues
found in the first chip, and provides good hardware and background performance metrics
Primitives and design of the intelligent pixel multimedia communicator
Communication systems arc an ever more essential component of our modern global society. Mobile communications systems are still in a state of rapid advancement and growth. Technology is constantly evolving at a rapid pace in ever more diverse areas and the emerging mobile multimedia based communication systems offer new challenges for both current and future technologies. To realise the full potential of mobile multimedia communication systems there is a need to explore new options to solve some of the fundamental problems facing the technology. In particular, the complexity of such a system within an infrastructure framework that is inherently limited by its power sources and has very restricted transmission bandwidth demands new methodologies and approaches
Micro/Nano Devices for Blood Analysis, Volume II
The development of micro- and nanodevices for blood analysis continues to be a growing interdisciplinary subject that demands the careful integration of different research fields. Following the success of the book “Micro/Nano Devices for Blood Analysis”, we invited more authors from the scientific community to participate in and submit their research for a second volume. Researchers from different areas and backgrounds cooperated actively and submitted high-quality research, focusing on the latest advances and challenges in micro- and nanodevices for diagnostics and blood analysis; micro- and nanofluidics; technologies for flow visualization and diagnosis; biochips, organ-on-a-chip and lab-on-a-chip devices; and their applications to research and industry
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
Approximate computing: An integrated cross-layer framework
A new design approach, called approximate computing (AxC), leverages the flexibility provided by intrinsic application resilience to realize hardware or software implementations that are more efficient in energy or performance. Approximate computing techniques forsake exact (numerical or Boolean) equivalence in the execution of some of the application’s computations, while ensuring that the output quality is acceptable. While early efforts in approximate computing have demonstrated great potential, they consist of ad hoc techniques applied to a very narrow set of applications, leaving in question the applicability of approximate computing in a broader context.
The primary objective of this thesis is to develop an integrated cross-layer approach to approximate computing, and to thereby establish its applicability to a broader range of applications. The proposed framework comprises of three key components: (i) At the circuit level, systematic approaches to design approximate circuits, or circuits that realize a slightly modified function with improved efficiency, (ii) At the architecture level, utilize approximate circuits to build programmable approximate processors, and (iii) At the software level, methods to apply approximate computing to machine learning classifiers, which represent an important class of applications that are being utilized across the computing spectrum. Towards this end, the thesis extends the state-of-the-art in approximate computing in the following important directions.
Synthesis of Approximate Circuits: First, the thesis proposes a rigorous framework for the automatic synthesis of approximate circuits , which are the hardware building blocks of approximate computing platforms. Designing approximate circuits involves making judicious changes to the function implemented by the circuit such that its hardware complexity is lowered without violating the specified quality constraint. Inspired by classical approaches to Boolean optimization in logic synthesis, the thesis proposes two synthesis tools called SALSA and SASIMI that are general, i.e., applicable to any given circuit and quality specification. The framework is further extended to automatically design quality configurable circuits , which are approximate circuits with the capability to reconfigure their quality at runtime. Over a wide range of arithmetic circuits, complex modules and complete datapaths, the circuits synthesized using the proposed framework demonstrate significant benefits in area and energy.
Programmable AxC Processors: Next, the thesis extends approximate computing to the realm of programmable processors by introducing the concept of quality programmable processors (QPPs). A key principle of QPPs is that the notion of quality is explicitly codified in their HW/SW interface i.e., the instruction set. Instructions in the ISA are extended with quality fields, enabling software to specify the accuracy level that must be met during their execution. The micro-architecture is designed with hardware mechanisms to understand these quality specifications and translate them into energy savings. As a first embodiment of QPPs, the thesis presents a quality programmable 1D/2D vector processor QP-Vec, which contains a 3-tiered hierarchy of processing elements. Based on an implementation of QP-Vec with 289 processing elements, energy benefits up to 2.5X are demonstrated across a wide range of applications.
Software and Algorithms for AxC: Finally, the thesis addresses the problem of applying approximate computing to an important class of applications viz. machine learning classifiers such as deep learning networks. To this end, the thesis proposes two approaches—AxNN and scalable effort classifiers. Both approaches leverage domain- specific insights to transform a given application to an energy-efficient approximate version that meets a specified application output quality. In the context of deep learning networks, AxNN adapts backpropagation to identify neurons that contribute less significantly to the network’s accuracy, approximating these neurons (e.g., by using lower precision), and incrementally re-training the network to mitigate the impact of approximations on output quality. On the other hand, scalable effort classifiers leverage the heterogeneity in the inherent classification difficulty of inputs to dynamically modulate the effort expended by machine learning classifiers. This is achieved by building a chain of classifiers of progressively growing complexity (and accuracy) such that the number of stages used for classification scale with input difficulty. Scalable effort classifiers yield substantial energy benefits as a majority of the inputs require very low effort in real-world datasets. In summary, the concepts and techniques presented in this thesis broaden the applicability of approximate computing, thus taking a significant step towards bringing approximate computing to the mainstream. (Abstract shortened by ProQuest.
Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic
processors, new opportunities are emerging for applying deep and Spiking Neural
Network (SNN) algorithms to healthcare and biomedical applications at the edge.
This can facilitate the advancement of the medical Internet of Things (IoT)
systems and Point of Care (PoC) devices. In this paper, we provide a tutorial
describing how various technologies ranging from emerging memristive devices,
to established Field Programmable Gate Arrays (FPGAs), and mature Complementary
Metal Oxide Semiconductor (CMOS) technology can be used to develop efficient DL
accelerators to solve a wide variety of diagnostic, pattern recognition, and
signal processing problems in healthcare. Furthermore, we explore how spiking
neuromorphic processors can complement their DL counterparts for processing
biomedical signals. After providing the required background, we unify the
sparsely distributed research on neural network and neuromorphic hardware
implementations as applied to the healthcare domain. In addition, we benchmark
various hardware platforms by performing a biomedical electromyography (EMG)
signal processing task and drawing comparisons among them in terms of inference
delay and energy. Finally, we provide our analysis of the field and share a
perspective on the advantages, disadvantages, challenges, and opportunities
that different accelerators and neuromorphic processors introduce to healthcare
and biomedical domains. This paper can serve a large audience, ranging from
nanoelectronics researchers, to biomedical and healthcare practitioners in
grasping the fundamental interplay between hardware, algorithms, and clinical
adoption of these tools, as we shed light on the future of deep networks and
spiking neuromorphic processing systems as proponents for driving biomedical
circuits and systems forward.Comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (21
pages, 10 figures, 5 tables
Analog Front-End Circuits for Massive Parallel 3-D Neural Microsystems.
Understanding dynamics of the brain has tremendously improved due to the progress in neural recording techniques over the past five decades. The number of simultaneously recorded channels has actually doubled every 7 years, which implies that a recording system with a few thousand channels should be available in the next two decades. Nonetheless, a leap in the number of simultaneous channels has remained an unmet need due to many limitations, especially in the front-end recording integrated circuits (IC).
This research has focused on increasing the number of simultaneously recorded channels and providing modular design approaches to improve the integration and expansion of 3-D recording microsystems. Three analog front-ends (AFE) have been developed using extremely low-power and small-area circuit techniques on both the circuit and system levels. The three prototypes have investigated some critical circuit challenges in power, area, interface, and modularity.
The first AFE (16-channels) has optimized energy efficiency using techniques such as moderate inversion, minimized asynchronous interface for data acquisition, power-scalable sampling operation, and a wide configuration range of gain and bandwidth. Circuits in this part were designed in a 0.25μm CMOS process using a 0.9-V single supply and feature a power consumption of 4μW/channel and an energy-area efficiency of 7.51x10^15 in units of J^-1Vrms^-1mm^-2.
The second AFE (128-channels) provides the next level of scaling using dc-coupled analog compression techniques to reject the electrode offset and reduce the implementation area further. Signal processing techniques were also explored to transfer some computational power outside the brain. Circuits in this part were designed in a 180nm CMOS process using a 0.5-V single supply and feature a power consumption of 2.5μW/channel, and energy-area efficiency of 30.2x10^15 J^-1Vrms^-1mm^-2.
The last AFE (128-channels) shows another leap in neural recording using monolithic integration of recording circuits on the shanks of neural probes. Monolithic integration may be the most effective approach to allow simultaneous recording of more than 1,024 channels. The probe and circuits in this part were designed in a 150 nm SOI CMOS process using a 0.5-V single supply and feature a power consumption of only 1.4μW/channel and energy-area efficiency of 36.4x10^15 J^-1Vrms^-1mm^-2.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/98070/1/ashmouny_1.pd
Integrated Circuit Design in US High-Energy Physics
This whitepaper summarizes the status, plans, and challenges in the area of
integrated circuit design in the United States for future High Energy Physics
(HEP) experiments. It has been submitted to CPAD (Coordinating Panel for
Advanced Detectors) and the HEP Community Summer Study 2013(Snowmass on the
Mississippi) held in Minnesota July 29 to August 6, 2013. A workshop titled: US
Workshop on IC Design for High Energy Physics, HEPIC2013 was held May 30 to
June 1, 2013 at Lawrence Berkeley National Laboratory (LBNL). A draft of the
whitepaper was distributed to the attendees before the workshop, the content
was discussed at the meeting, and this document is the resulting final product.
The scope of the whitepaper includes the following topics: Needs for IC
technologies to enable future experiments in the three HEP frontiers Energy,
Cosmic and Intensity Frontiers; Challenges in the different technology and
circuit design areas and the related R&D needs; Motivation for using different
fabrication technologies; Outlook of future technologies including 2.5D and 3D;
Survey of ICs used in current experiments and ICs targeted for approved or
proposed experiments; IC design at US institutes and recommendations for
collaboration in the future
Algorithmic Analysis Techniques for Molecular Imaging
This study addresses image processing techniques for two medical imaging
modalities: Positron Emission Tomography (PET) and Magnetic Resonance
Imaging (MRI), which can be used in studies of human body functions and
anatomy in a non-invasive manner.
In PET, the so-called Partial Volume Effect (PVE) is caused by low
spatial resolution of the modality. The efficiency of a set of PVE-correction
methods is evaluated in the present study. These methods use information
about tissue borders which have been acquired with the MRI technique. As
another technique, a novel method is proposed for MRI brain image segmen-
tation. A standard way of brain MRI is to use spatial prior information
in image segmentation. While this works for adults and healthy neonates,
the large variations in premature infants preclude its direct application.
The proposed technique can be applied to both healthy and non-healthy
premature infant brain MR images. Diffusion Weighted Imaging (DWI) is
a MRI-based technique that can be used to create images for measuring
physiological properties of cells on the structural level. We optimise the
scanning parameters of DWI so that the required acquisition time can be
reduced while still maintaining good image quality.
In the present work, PVE correction methods, and physiological DWI
models are evaluated in terms of repeatabilityof the results. This gives in-
formation on the reliability of the measures given by the methods. The
evaluations are done using physical phantom objects, correlation measure-
ments against expert segmentations, computer simulations with realistic
noise modelling, and with repeated measurements conducted on real pa-
tients. In PET, the applicability and selection of a suitable partial volume
correction method was found to depend on the target application. For MRI,
the data-driven segmentation offers an alternative when using spatial prior is
not feasible. For DWI, the distribution of b-values turns out to be a central
factor affecting the time-quality ratio of the DWI acquisition. An optimal
b-value distribution was determined. This helps to shorten the imaging time
without hampering the diagnostic accuracy.Siirretty Doriast
Insect-vision inspired collision warning vision processor for automobiles
Vision is expected to play important roles for car safety enhancement. Imaging systems can be used to enlarging the vision field of the driver. For instance capturing and displaying views of hidden areas around the car which the driver can analyze for safer decision-making. Vision systems go a step further. They can autonomously analyze the visual information, identify dangerous situations and prompt the delivery of warning signals. For instance in case of road lane departure, if an overtaking car is in the blind spot, if an object is approaching within collision course, etc. Processing capabilities are also needed for applications viewing the car interior such as >intelligent airbag systems> that base deployment decisions on passenger features. On-line processing of visual information for car safety involves multiple sensors and views, huge amount of data per view and large frame rates. The associated computational load may be prohibitive for conventional processing architectures. Dedicated systems with embedded local processing capabilities may be needed to confront the challenges. This paper describes a dedicated sensory-processing architecture for collision warning which is inspired by insect vision. Particularly, the paper relies on the exploitation of the knowledge about the behavior of Locusta Migratoria to develop dedicated chips and systems which are integrated into model cars as well as into a commercial car (Volvo XC90) and tested to deliver collision warnings in real traffic scenarios.Gobierno de España TEC2006-15722European Community IST:2001-3809
- …