2,124 research outputs found
High-Speed and Low-Energy On-Chip Communication Circuits.
Continuous technology scaling sharply reduces transistor delays, while fixed-length global wire delays have increased due to less wiring pitch with higher resistance and coupling capacitance. Due to this ever growing gap, long on-chip interconnects pose well-known latency, bandwidth, and energy challenges to high-performance VLSI systems. Repeaters effectively mitigate wire RC effects but do little to improve their energy costs. Moreover, the increased complexity and high level of integration requires higher wire densities, worsening crosstalk noise and power consumption of conventionally repeated interconnects.
Such increasing concerns in global on-chip wires motivate circuits to improve wire performance and energy while reducing the number of repeaters. This work presents circuit techniques and investigation for high-performance and energy-efficient on-chip communication in the aspects of encoding, data compression, self-timed current injection, signal pre-emphasis, low-swing signaling, and technology mapping. The improved bus designs also consider the constraints of robust operation and performance/energy gains across process corners and design space. Measurement results from 5mm links on 65nm and 90nm prototype chips validate 2.5-3X improvement in energy-delay product.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/75800/1/jseo_1.pd
Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
Deep learning has significantly advanced the state of the
art in artificial intelligence, gaining wide popularity from both industry
and academia. Special interest is around Convolutional Neural Networks
(CNN), which take inspiration from the hierarchical structure
of the visual cortex, to form deep layers of convolutional operations,
along with fully connected classifiers. Hardware implementations of these
deep CNN architectures are challenged with memory bottlenecks that
require many convolution and fully-connected layers demanding large
amount of communication for parallel computation. Multi-core CPU
based solutions have demonstrated their inadequacy for this problem
due to the memory wall and low parallelism. Many-core GPU architectures
show superior performance but they consume high power and also
have memory constraints due to inconsistencies between cache and main
memory. OpenCL is commonly used to describe these architectures for
their execution on GPGPUs or FPGAs. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy
using embedded parallel BlockRAMs. This boosts the parallel use
of shared memory elements between multiple processing units, avoiding
data replicability and inconsistencies. This makes FPGAs potentially
powerful solutions for real-time classification of CNNs. In this
paper both Altera and Xilinx adopted OpenCL co-design frameworks
for pseudo-automatic development solutions are evaluated. A comprehensive
evaluation and comparison for a 5-layer deep CNN is presented.
Hardware resources, temporal performance and the OpenCL architecture
for CNNs are discussed. Xilinx demonstrates faster synthesis, better
FPGA resource utilization and more compact boards. Altera provides
multi-platforms tools, mature design community and better execution
times.Ministerio de Economía y Competitividad TEC2016-77785-
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times
Algorithm and Hardware Design of Discrete-Time Spiking Neural Networks Based on Back Propagation with Binary Activations
We present a new back propagation based training algorithm for discrete-time
spiking neural networks (SNN). Inspired by recent deep learning algorithms on
binarized neural networks, binary activation with a straight-through gradient
estimator is used to model the leaky integrate-fire spiking neuron, overcoming
the difficulty in training SNNs using back propagation. Two SNN training
algorithms are proposed: (1) SNN with discontinuous integration, which is
suitable for rate-coded input spikes, and (2) SNN with continuous integration,
which is more general and can handle input spikes with temporal information.
Neuromorphic hardware designed in 40nm CMOS exploits the spike sparsity and
demonstrates high classification accuracy (>98% on MNIST) and low energy
(48.4-773 nJ/image).Comment: 2017 IEEE Biomedical Circuits and Systems (BioCAS
Clinical characteristics and treatment modalities of vulvovaginal atrophy in genitourinary syndrome of menopause
Background: Genitourinary syndrome of menopause (GSM) causes symptoms such as vaginal dryness, dysuria, repetitive urinary tract infection and urinary urgency may affect daily activities, sexual relationships, and overall quality of life. The aim of the study was to provide the clinical characteristics of VVA patients in South Korea and the effectiveness as well as complications of the currently used low dose estrogen vaginal suppository.Methods: 52 women who has visited the outpatient gynecology clinic of the National Health Insurance Service Ilsan Hospital from January 2018 to December 2019 were recruited as study subjects. For the analysis of the clinical characteristics, subjective symptoms described by the patient’s own words such as vaginal dryness, pain, dysuria, dyspareunia, or no symptoms at all were included. Objective signs such as thinning of vaginal rugae, mucosal dryness, and mucosal fragility and the presence of petechiae were recorded.Results: Vaginal dryness was the most common complaint (92.3%). Thinning of the vaginal rugae was the most commonly noted objective sign (73.1%). Of the 52 subjects, 31 (59.6%) refrained from using the low dose estrogen vaginal suppository. The most common reason for not being able to use the suppository was the inability to insert the suppository (32.3%).Conclusions: Although patient-reported symptoms and clinical objectivity through physical examination are two components in diagnosing VVA, further study is warranted for a more objective and discriminatory diagnosis criteria for VVA. As the only available treatment modality was low dose vaginal estrogen suppository, comparison with other treatment modalities were not available
Recommended from our members
Improved application of the electrophoretic tissue clearing technology, CLARITY, to intact solid organs including brain, pancreas, liver, kidney, lung, and intestine
Background: Mapping of tissue structure at the cellular, circuit, and organ-wide scale is important for understanding physiological and biological functions. A bio-electrochemical technique known as CLARITY used for three-dimensional anatomical and phenotypical mapping within transparent intact tissues has been recently developed. This method provided a major advance in understanding the structure-function relationships in circuits of the nervous system and organs by using whole-body clearing. Thus, in the present study, we aimed to improve the original CLARITY procedure and developed specific CLARITY protocols for various intact organs. Results: We determined the optimal conditions for reducing bubble formation, discoloration, and depositing of black particles on the surface of tissue, which allowed production of clearer organ images. We also determined the appropriate replacement cycles of clearing solution for each type of organ, and convincingly demonstrated that 250–280 mA is the ideal range of electrical current for tissue clearing. We then acquired each type of cleared organs including brain, pancreas, liver, lung, kidney, and intestine. Additionally, we determined the images of axon fibers of hippocampal region, the Purkinje layer of cerebellum, and vessels and cellular nuclei of pancreas. Conclusions: CLARITY is an innovative biochemical technology for the structural and molecular analysis of various types of tissue. We developed improved CLARITY methods for clearing of the brain, pancreas, lung, intestine, liver, and kidney, and identified the appropriate experimental conditions for clearing of each specific tissue type. These optimized methods will be useful for the application of CLARITY to various types of organs. Electronic supplementary material The online version of this article (doi:10.1186/s12861-014-0048-3) contains supplementary material, which is available to authorized users
Tissue- and Stage-specific Expression of Two Lipophorin Receptor Variants with Seven and Eight Ligand-binding Repeats in the Adult Mosquito
We identified two splice variants of lipophorin receptor (LpR) gene products specific to the mosquito fat body (AaLpRfb) and ovary (AaLpRov) with respective molecular masses of 99.3 and 128.9 kDa. Each LpR variant encodes a member of the low density lipoprotein receptor family with five characteristic domains: 1) ligand recognition, 2) epidermal growth factor precursor, 3) putative O-linked sugar, 4) single membrane-spanning domains, and 5) the cytoplasmic tail with a highly conserved internalization signal FDNPVY. Proposed phylogenetic relationships among low density lipoprotein receptor superfamily members suggest that the LpRs of insects are more closely related to vertebrate low density lipoprotein receptors and very low density lipoprotein receptor/vitellogenin receptor than to insect vitellogenin receptor/yolk protein receptors. Two mosquito LpR isoforms differ in their amino termini, the ligand-binding domains, and O-linked sugar domains, which are generated by differential splicing. Polymerase chain reaction and Southern blot hybridization analyses show that these two transcripts originated from a single gene. Significantly, the putative ligand-binding domain consists of seven and eight complement-type, cysteine-rich repeats inAaLpRfb and AaLRov, respectively. Seven cysteine-rich repeats in AaLpRfb are identical to the second through eighth repeats of AaLpRov. Previous analyses (1) have indicated that the AaLpRov transcript is present exclusively in ovarian germ-line cells, nurse cells, and oocytes throughout the previtellogenic and vitellogenic stages, with the peak at 24–30 h after blood meal, coincident with the peak of yolk protein uptake. In contrast, the fat body-specific AaLpRfb transcript expression is restricted to the postvitellogenic period, during which yolk protein production is terminated and the fat body is transformed to a storage depot of lipid, carbohydrate, and protein
- …