221,363 research outputs found
Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models
The International Linear Collider
In this article, we describe the key features of the recently completed
technical design for the International Linear Collider (ILC), a 200-500 GeV
linear electron-positron collider (expandable to 1 TeV) that is based on 1.3
GHz superconducting radio-frequency (SCRF) technology. The machine parameters
and detector characteristics have been chosen to complement the Large Hadron
Collider physics, including the discovery of the Higgs boson, and to further
exploit this new particle physics energy frontier with a precision instrument.
The linear collider design is the result of nearly twenty years of R&D,
resulting in a mature conceptual design for the ILC project that reflects an
international consensus. We summarize the physics goals and capability of the
ILC, the enabling R&D and resulting accelerator design, as well as the concepts
for two complementary detectors. The ILC is technically ready to be proposed
and built as a next generation lepton collider, perhaps to be built in stages
beginning as a Higgs factory.Comment: 41 page
Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators
We propose a lightweight scheme where the formation of a data block is changed in such a way that it can tolerate soft errors significantly better than the baseline. The key insight behind our work is that CNN weights are normalized between -1 and 1 after each convolutional layer, and this leaves one bit unused in half-precision floating-point representation. By taking advantage of the unused bit, we create a backup for the most significant bit to protect it against the soft errors. Also, considering the fact that in MLC STT-RAMs the cost of memory operations (read and write), and reliability of a cell are content-dependent (some patterns take larger current and longer time, while they are more susceptible to soft error), we rearrange the data block to minimize the number of costly bit patterns. Combining these two techniques provides the same level of accuracy compared to an error-free baseline while improving the read and write energy by 9% and 6%, respectively
Next generation sequencing in cancer: opportunities and challenges for precision cancer medicine
Over the past decade, testing the genes of patients and their specific cancer types has become standardized
practice in medical oncology since somatic mutations, changes in gene expression and epigenetic
modifications are all hallmarks of cancer. However, while cancer genetic assessment has been limited to
single biomarkers to guide the use of therapies, improvements in nucleic acid sequencing technologies
and implementation of different genome analysis tools have enabled clinicians to detect these genomic
alterations and identify functional and disease-associated genomic variants. Next-generation sequencing
(NGS) technologies have provided clues about therapeutic targets and genomic markers for novel clinical
applications when standard therapy has failed. While Sanger sequencing, an accurate and sensitive
approach, allows for the identification of potential novel variants, it is however limited by the single
amplicon being interrogated. Similarly, quantitative and qualitative profiling of gene expression changes
also represents a challenge for the cancer field. Both RT-PCR and microarrays are efficient approaches,
but are limited to the genes present on the array or being assayed. This leaves vast swaths of the transcriptome,
including non-coding RNAs and other features, unexplored. With the advent of the ability to
collect and analyze genomic sequence data in a timely fashion and at an ever-decreasing cost, many of
these limitations have been overcome and are being incorporated into cancer research and diagnostics
giving patients and clinicians new hope for targeted and personalized treatment. Below we highlight
the various applications of next-generation sequencing in precision cancer medicine
Recommended from our members
Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization.
The key operation in stochastic neural networks, which have become the state-of-the-art approach for solving problems in machine learning, information theory, and statistics, is a stochastic dot-product. While there have been many demonstrations of dot-product circuits and, separately, of stochastic neurons, the efficient hardware implementation combining both functionalities is still missing. Here we report compact, fast, energy-efficient, and scalable stochastic dot-product circuits based on either passively integrated metal-oxide memristors or embedded floating-gate memories. The circuit's high performance is due to mixed-signal implementation, while the efficient stochastic operation is achieved by utilizing circuit's noise, intrinsic and/or extrinsic to the memory cell array. The dynamic scaling of weights, enabled by analog memory devices, allows for efficient realization of different annealing approaches to improve functionality. The proposed approach is experimentally verified for two representative applications, namely by implementing neural network for solving a four-node graph-partitioning problem, and a Boltzmann machine with 10-input and 8-hidden neurons
A portable load cell for in-situ ore impact breakage testing
This paper discusses the design and characterisation of a short, and hence portable impact load cell for in-situ quantification of ore breakage properties under impact loading conditions. Much literature has been published in the past two decades about impact load cells for ore breakage testing. It has been conclusively shown that such machines yield significant quantitative energy-fragmentation information about industrial ores. However, documented load cells are all laboratory systems that are not adapted for in-situ testing due to their dimensions and operating requirements. The authors report on a new portable impact load cell designed specifically for in-situ testing. The load cell is 1.5 m in height and weighs 30 kg. Its physical and operating characteristics are detailed in the paper. This includes physical dimensions, calibration and signal deconvolution. Emphasis is placed on the deconvolution issue, which is significant for such a short load cell. Finally, it is
conclusively shown that the short load cell is quantitatively as accurate as its larger laboratory analogues
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to
conventional deep neural networks at a fraction of the cost in terms of memory
and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully
digital configurable hardware accelerator IP for BNNs, integrated within a
microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid
SRAM / standard cell memory. The XNE is able to fully compute convolutional and
dense layers in autonomy or in cooperation with the core in the MCU to realize
more complex behaviors. We show post-synthesis results in 65nm and 22nm
technology for the XNE IP and post-layout results in 22nm for the full MCU
indicating that this system can drop the energy cost per binary operation to
21.6fJ per operation at 0.4V, and at the same time is flexible and performant
enough to execute state-of-the-art BNN topologies such as ResNet-34 in less
than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation
at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design
of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu
- âŠ