641 research outputs found
High-performance acceleration of 2-D and 3D CNNs on FPGAs using static block floating point
Over the past few years, 2-D convolutional neural networks (CNNs) have demonstrated their great success in a wide range of 2-D computer vision applications, such as image classification and object detection. At the same time, 3-D CNNs, as a variant of 2-D CNNs, have shown their excellent ability to analyze 3-D data, such as video and geometric data. However, the heavy algorithmic complexity of 2-D and 3-D CNNs imposes a substantial overhead over the speed of these networks, which limits their deployment in real-life applications. Although various domain-specific accelerators have been proposed to address this challenge, most of them only focus on accelerating 2-D CNNs, without considering their computational efficiency on 3-D CNNs. In this article, we propose a unified hardware architecture to accelerate both 2-D and 3-D CNNs with high hardware efficiency. Our experiments demonstrate that the proposed accelerator can achieve up to 92.4% and 85.2% multiply-accumulate efficiency on 2-D and 3-D CNNs, respectively. To improve the hardware performance, we propose a hardware-friendly quantization approach called static block floating point (BFP), which eliminates the frequent representation conversions required in traditional dynamic BFP arithmetic. Comparing with the integer linear quantization using zero-point, the static BFP quantization can decrease the logic resource consumption of the convolutional kernel design by nearly 50% on a field-programmable gate array (FPGA). Without time-consuming retraining, the proposed static BFP quantization is able to quantize the precision to 8-bit mantissa with negligible accuracy loss. As different CNNs on our reconfigurable system require different hardware and software parameters to achieve optimal hardware performance and accuracy, we also propose an automatic tool for parameter optimization. Based on our hardware design and optimization, we demonstrate that the proposed accelerator can achieve 3.8-5.6 times higher energy efficiency than graphics processing unit (GPU) implementation. Comparing with the state-of-the-art FPGA-based accelerators, our design achieves higher generality and up to 1.4-2.2 times higher resource efficiency on both 2-D and 3-D CNNs
Effect of different drying methods on the protein and product quality of hairtail fish meat gel
Three different methods, namely hot air drying (HA), microwave vacuum drying (MV), and vacuum freeze drying (FD), were employed to investigate the effect of drying method on the quality of hairtail fish meat gel. Compared with HA and MV, FD samples showed a better quality in terms of moisture content, water absorption index, and water solubility index, and had the highest overall acceptance in sensory evaluation. FD preserved the protein from degradation and formed an ordered porous microstructure. The nitrogen fraction assay revealed that protein was degraded into 40–100 kDa fragments during drying in HA, which was almost not affected by MV and FD. Overall, FD was the most suitable method for drying of meat gel made from hairtail, followed by MV and HA
F-E3D: FPGA-based acceleration of an efficient 3D convolutional neural network for human action recognition
Three-dimensional convolutional neural networks (3D CNNs) have demonstrated their outstanding classification accuracy for human action recognition (HAR). However, the large number of computations and parameters in 3D CNNs limits their deployability in real-life applications. To address this challenge, this paper adopts an algorithm-hardware co-design method by proposing an efficient 3D CNN building unit called 3D-1 bottleneck residual block (3D-1 BRB) at the algorithm level, and a corresponding FPGA-based hardware architecture called F-E3D at hardware level. Based on 3D-1 BRB, a novel 3D CNN model called E3DNet is developed, which achieves nearly 37 times reduction in model size and 5% improvement in accuracy compared to standard 3D CNNs on the UCF101 dataset. Together with several hardware optimizations, including 3D fused BRB, online blocking and kernel reuse, the proposed F-E3D is nearly 13 times faster than a previous FPGA design for 3D CNNs, with performance and accuracy comparable to other state-of-the-art 3D CNN models on GPU platforms while requiring only 7% of their energy consumption
Measurement of decays to baryon pairs
A sample of 3.95M decays registered in the BES detector are used
to study final states containing pairs of octet and decuplet baryons. We report
branching fractions for , ,
, ,
, ,
, and . These results
are compared to expectations based on the SU(3)-flavor symmetry, factorization,
and perturbative QCD.Comment: 22 pages, 21 figures, 4 table
Measurements of the Cross Section for e+e- -> hadrons at Center-of-Mass Energies from 2 to 5 GeV
We report values of for 85 center-of-mass energies between
2 and 5 GeV measured with the upgraded Beijing Spectrometer at the Beijing
Electron-Positron Collider.Comment: 5 pages, 3 figure
First Measurement of the Branching Fraction of the Decay psi(2S) --> tau tau
The branching fraction of the psi(2S) decay into tau pair has been measured
for the first time using the BES detector at the Beijing Electron-Positron
Collider. The result is ,
where the first error is statistical and the second is systematic. This value,
along with those for the branching fractions into e+e- and mu+mu of this
resonance, satisfy well the relation predicted by the sequential lepton
hypothesis. Combining all these values with the leptonic width of the resonance
the total width of the psi(2S) is determined to be keV.Comment: 9 pages, 2 figure
Measurement of the Total Cross Section for Hadronic Production by e+e- Annihilation at Energies between 2.6-5 Gev
Using the upgraded Beijing Spectrometer (BESII), we have measured the total
cross section for annihilation into hadronic final states at
center-of-mass energies of 2.6, 3.2, 3.4, 3.55, 4.6 and 5.0 GeV. Values of ,
, are determined.Comment: Submitted to Phys. Rev. Let
Measurement of the Inclusive Charm Cross Section at 4.03 GeV and 4.14 GeV
The cross section for charmed meson production at and 4.14
GeV has been measured with the Beijing Spectrometer. The measurement was made
using 22.3 of data collected at 4.03 GeV and 1.5
of data collected at 4.14 GeV. Inclusive observed cross sections for
the production of charged and neutral D mesons and momentum spectra are
presented. Observed cross sections were radiatively corrected to obtain tree
level cross sections. Measurements of the total hadronic cross section are
obtained from the charmed meson cross section and an extrapolation of results
from below the charm threshold.Comment: 11 pages, 13 figures. The top level tex file is paper.tex. It builds
the paper from other tex files in this .tar and the .eps file
Study of the P-wave charmonium state \chi_{cJ} in \psi(2S) decays
The processes , and have been studied using a sample of produced
decays. We determine the total width of the to be
MeV. We present the first
measurement of the branching fraction , where the first error is statistical and the
second one systematic. Branching fractions of and
are also reported.Comment: 10 pages, revtex, 3 figures, 2 table
Phase Separation Models for Cuprate Stripe Arrays
An electronic phase separation model provides a natural explanation for a
large variety of experimental results in the cuprates, including evidence for
both stripes and larger domains, and a termination of the phase separation in
the slightly overdoped regime, when the average hole density equals that on the
charged stripes. Several models are presented for charged stripes, showing how
density waves, superconductivity, and strong correlations compete with quantum
size effects (QSEs) in narrow stripes. The energy bands associated with the
charged stripes develop in the middle of the Mott gap, and the splitting of
these bands can be understood by considering the QSE on a single ladder.Comment: significant revisions: includes island phase, 16 eps figures, revte
- …