644 research outputs found

    High-performance acceleration of 2-D and 3D CNNs on FPGAs using static block floating point

    Get PDF
    Over the past few years, 2-D convolutional neural networks (CNNs) have demonstrated their great success in a wide range of 2-D computer vision applications, such as image classification and object detection. At the same time, 3-D CNNs, as a variant of 2-D CNNs, have shown their excellent ability to analyze 3-D data, such as video and geometric data. However, the heavy algorithmic complexity of 2-D and 3-D CNNs imposes a substantial overhead over the speed of these networks, which limits their deployment in real-life applications. Although various domain-specific accelerators have been proposed to address this challenge, most of them only focus on accelerating 2-D CNNs, without considering their computational efficiency on 3-D CNNs. In this article, we propose a unified hardware architecture to accelerate both 2-D and 3-D CNNs with high hardware efficiency. Our experiments demonstrate that the proposed accelerator can achieve up to 92.4% and 85.2% multiply-accumulate efficiency on 2-D and 3-D CNNs, respectively. To improve the hardware performance, we propose a hardware-friendly quantization approach called static block floating point (BFP), which eliminates the frequent representation conversions required in traditional dynamic BFP arithmetic. Comparing with the integer linear quantization using zero-point, the static BFP quantization can decrease the logic resource consumption of the convolutional kernel design by nearly 50% on a field-programmable gate array (FPGA). Without time-consuming retraining, the proposed static BFP quantization is able to quantize the precision to 8-bit mantissa with negligible accuracy loss. As different CNNs on our reconfigurable system require different hardware and software parameters to achieve optimal hardware performance and accuracy, we also propose an automatic tool for parameter optimization. Based on our hardware design and optimization, we demonstrate that the proposed accelerator can achieve 3.8-5.6 times higher energy efficiency than graphics processing unit (GPU) implementation. Comparing with the state-of-the-art FPGA-based accelerators, our design achieves higher generality and up to 1.4-2.2 times higher resource efficiency on both 2-D and 3-D CNNs

    Effect of different drying methods on the protein and product quality of hairtail fish meat gel

    Get PDF
    Three different methods, namely hot air drying (HA), microwave vacuum drying (MV), and vacuum freeze drying (FD), were employed to investigate the effect of drying method on the quality of hairtail fish meat gel. Compared with HA and MV, FD samples showed a better quality in terms of moisture content, water absorption index, and water solubility index, and had the highest overall acceptance in sensory evaluation. FD preserved the protein from degradation and formed an ordered porous microstructure. The nitrogen fraction assay revealed that protein was degraded into 40–100 kDa fragments during drying in HA, which was almost not affected by MV and FD. Overall, FD was the most suitable method for drying of meat gel made from hairtail, followed by MV and HA

    F-E3D: FPGA-based acceleration of an efficient 3D convolutional neural network for human action recognition

    Get PDF
    Three-dimensional convolutional neural networks (3D CNNs) have demonstrated their outstanding classification accuracy for human action recognition (HAR). However, the large number of computations and parameters in 3D CNNs limits their deployability in real-life applications. To address this challenge, this paper adopts an algorithm-hardware co-design method by proposing an efficient 3D CNN building unit called 3D-1 bottleneck residual block (3D-1 BRB) at the algorithm level, and a corresponding FPGA-based hardware architecture called F-E3D at hardware level. Based on 3D-1 BRB, a novel 3D CNN model called E3DNet is developed, which achieves nearly 37 times reduction in model size and 5% improvement in accuracy compared to standard 3D CNNs on the UCF101 dataset. Together with several hardware optimizations, including 3D fused BRB, online blocking and kernel reuse, the proposed F-E3D is nearly 13 times faster than a previous FPGA design for 3D CNNs, with performance and accuracy comparable to other state-of-the-art 3D CNN models on GPU platforms while requiring only 7% of their energy consumption

    Measurement of ψ(2S)\psi(2S) decays to baryon pairs

    Full text link
    A sample of 3.95M ψ(2S)\psi(2S) decays registered in the BES detector are used to study final states containing pairs of octet and decuplet baryons. We report branching fractions for ψ(2S)→ppˉ\psi(2S)\to p\bar{p}, ΛΛˉ\Lambda\bar{\Lambda}, ÎŁ0Σˉ0\Sigma^0\bar{\Sigma}{}^0, Ξ−Ξˉ+\Xi^-\bar{\Xi}{}^+, Δ++Δˉ−−\Delta^{++}\bar{\Delta}{}^{--}, ÎŁ+(1385)Σˉ−(1385)\Sigma^+(1385)\bar{\Sigma}{}^-(1385), Ξ0(1530)Ξˉ0(1530)\Xi^0(1530)\bar{\Xi}{}^0(1530), and Ω−Ωˉ+\Omega^-\bar{\Omega}{}^+. These results are compared to expectations based on the SU(3)-flavor symmetry, factorization, and perturbative QCD.Comment: 22 pages, 21 figures, 4 table

    Measurements of the Cross Section for e+e- -> hadrons at Center-of-Mass Energies from 2 to 5 GeV

    Get PDF
    We report values of R=σ(e+e−→hadrons)/σ(e+e−→Ό+Ό−)R = \sigma(e^+e^-\to {hadrons})/\sigma(e^+e^-\to\mu^+\mu^-) for 85 center-of-mass energies between 2 and 5 GeV measured with the upgraded Beijing Spectrometer at the Beijing Electron-Positron Collider.Comment: 5 pages, 3 figure

    First Measurement of the Branching Fraction of the Decay psi(2S) --> tau tau

    Full text link
    The branching fraction of the psi(2S) decay into tau pair has been measured for the first time using the BES detector at the Beijing Electron-Positron Collider. The result is Bττ=(2.71±0.43±0.55)×10−3B_{\tau\tau}=(2.71\pm 0.43 \pm 0.55) \times 10^{-3}, where the first error is statistical and the second is systematic. This value, along with those for the branching fractions into e+e- and mu+mu of this resonance, satisfy well the relation predicted by the sequential lepton hypothesis. Combining all these values with the leptonic width of the resonance the total width of the psi(2S) is determined to be (252±37)(252 \pm 37) keV.Comment: 9 pages, 2 figure

    Measurement of the Total Cross Section for Hadronic Production by e+e- Annihilation at Energies between 2.6-5 Gev

    Get PDF
    Using the upgraded Beijing Spectrometer (BESII), we have measured the total cross section for e+e−e^+e^- annihilation into hadronic final states at center-of-mass energies of 2.6, 3.2, 3.4, 3.55, 4.6 and 5.0 GeV. Values of RR, σ(e+e−→hadrons)/σ(e+e−→Ό+Ό−)\sigma(e^+e^-\to {hadrons})/\sigma(e^+e^-\to\mu^+\mu^-), are determined.Comment: Submitted to Phys. Rev. Let

    Measurement of the Inclusive Charm Cross Section at 4.03 GeV and 4.14 GeV

    Full text link
    The cross section for charmed meson production at s=4.03\sqrt{s} = 4.03 and 4.14 GeV has been measured with the Beijing Spectrometer. The measurement was made using 22.3 pb−1pb^{-1} of e+e−e^+e^- data collected at 4.03 GeV and 1.5 pb−1pb^{-1} of e+e−e^+e^- data collected at 4.14 GeV. Inclusive observed cross sections for the production of charged and neutral D mesons and momentum spectra are presented. Observed cross sections were radiatively corrected to obtain tree level cross sections. Measurements of the total hadronic cross section are obtained from the charmed meson cross section and an extrapolation of results from below the charm threshold.Comment: 11 pages, 13 figures. The top level tex file is paper.tex. It builds the paper from other tex files in this .tar and the .eps file

    Study of the P-wave charmonium state \chi_{cJ} in \psi(2S) decays

    Full text link
    The processes ψ(2S)â†’ÎłÏ€+π−\psi(2S)\to \gamma \pi^+ \pi^-, ÎłK+K−\gamma K^+ K^- and Îłppˉ\gamma p \bar{p} have been studied using a sample of 3.7×1063.7 \times 10^6 produced ψ(2S)\psi(2S) decays. We determine the total width of the χc0\chi_{c0} to be Γχc0tot=14.3±2.0±3.0\Gamma^{tot}_{\chi_{c0}} = 14.3\pm 2.0\pm 3.0 MeV. We present the first measurement of the branching fraction B(χc0→ppˉ)=(16.3±4.4±5.4)×10−5B(\chi_{c0} \to p \bar{p}) = (16.3 \pm 4.4 \pm 5.4)\times 10^{-5}, where the first error is statistical and the second one systematic. Branching fractions of χc0,2→π+π−\chi_{c0,2} \to \pi^+ \pi^- and K+K−K^+ K^- are also reported.Comment: 10 pages, revtex, 3 figures, 2 table

    Phase Separation Models for Cuprate Stripe Arrays

    Full text link
    An electronic phase separation model provides a natural explanation for a large variety of experimental results in the cuprates, including evidence for both stripes and larger domains, and a termination of the phase separation in the slightly overdoped regime, when the average hole density equals that on the charged stripes. Several models are presented for charged stripes, showing how density waves, superconductivity, and strong correlations compete with quantum size effects (QSEs) in narrow stripes. The energy bands associated with the charged stripes develop in the middle of the Mott gap, and the splitting of these bands can be understood by considering the QSE on a single ladder.Comment: significant revisions: includes island phase, 16 eps figures, revte
    • 

    corecore