1,837 research outputs found

    HAQ: Hardware-Aware Automated Quantization with Mixed Precision

    Full text link
    Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.Comment: CVPR 2019. The first three authors contributed equally to this work. Project page: https://hanlab.mit.edu/projects/haq

    Hardware-Centric AutoML for Mixed-Precision Quantization

    Full text link
    Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy, and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.Comment: Journal preprint of arXiv:1811.08886 (IJCV, 2020). The first three authors contributed equally to this work. Project page: https://hanlab.mit.edu/projects/haq

    Antioksidativna i citotoksična aktivnost ekstrakata mlijeka proizvedenog iz fermentirane crne soje

    Get PDF
    In this study, ethanol extracts from 2-day fermented black soybean milk (FBE) by immobilized Rhizopus oligosporus NTU5 have been evaluated for both antioxidant and cytotoxic activities. The results reveal that a 2-day FBE had strong 2,2-diphenyl-1-picrylhydrazyl (DPPH) scavenging effect (76 %). The extracts were further fractionated by silica gel column chromatography and an unknown compound, FBE5-A, was obtained, which exhibited strong antioxidant activity. IC50 of the DPPH scavenging effect of FBE5-A was 7.5 μg/mL, which is stronger than a commonly used antioxidant, vitamin E (α-tocopherol; 17.4 μg/mL), and similar to vitamin C (ascorbic acid; 7.6 μg/mL). The cytotoxic test demonstrated that extracts of 2-day fermented broth exhibited selective cytotoxic activity towards human carcinoma cells, Hep 3B (IC50=150.2 μg/mL), and did not affect normal human lung fibroblasts, MRC-5 (p<0.05). The results indicate the potential applications of fermented black soybean milk as functional food, pharmaceutical or cancer therapy formula.U radu je ispitana antioksidativna i citotoksična aktivnost etanolnih esktrakata mlijeka dobivenog nakon 2 dana fermentacije crne soje pomoću imobiliziranoga soja Rhizopus oligosporus NTU5. Rezultati su pokazali da takvo mlijeko ima izraženu sposobnost uklanjanja slobodnih DPPH radikala (76 %). Ekstrakti su mlijeka zatim frakcionirani kromatografijom na silikagelu i izdvojen je nepoznati sastojak, nazvan FBE5-A, izraženog antioksidativnog svojstva. Sposobnost toga sastojka da uklanja slobodne DPPH radikale kudikamo je veća (IC50=7,5 μg/mL) nego vitamina E, koji se obično koristi kao antioksidans (α-tokoferol, IC50=17,4 μg/mL) i približno jednaka vitaminu C (askorbinska kiselina, IC50=7,6 μg/mL). Ispitivanje citotoksičnog učinka FBE5-A pokazalo je da selektivno djeluje na ljudske stanice raka Hep 3B (IC50=150,2 μg/mL), a pritom ne utječe na normalne fibroblaste iz pluća MRC-5 (p<0,05). Rezultati upućuju na to da se mlijeko dobiveno fermentacijom crne soje može upotrijebiti kao funkcionalna hrana i za liječenje raznih bolesti, uključujući i rak

    Poly[(μ6-benzene-1,3,5-tricarboxyl­ato-κ6 O 1:O 1′:O 3:O 3′:O 5:O 5′)tris­(N,N-dimethyl­formamide-κO)tris­(μ3-formato-κ2 O:O′)trimagnesium(II)]

    Get PDF
    The title complex, [Mg3(CHO2)3(C9H3O6)(C3H7NO)3]n, exhib­its a two-dimensional structure parallel to (001), which is built up from the MgII atoms and bridging carboxyl­ate ligands (3 symmetry). The MgII atom is six-coordinated by one O atom from a dimethyl­formamide mol­ecule, two O atoms from two μ6-benzene-1,3,5-tricarboxyl­ate ligands and three O atoms from three μ3-formate ligands in a distorted octa­hedral geometry

    Poly[diaqua-μ4-biphenyl-4,4′-dicarboxyl­ato-magnesium(II)]

    Get PDF
    The solvothermal reaction of magnesium nitrate with bi­phenyl-4,4′-dicarboxylic acid in N,N-dimethyl­formamide and water leads to the formation of crystals of the title complex, [Mg(C14H8O4)(H2O)2]n. In the crystal structure, the Mg cations are coordinated by six O atoms from two water mol­ecules and four symmetry-related biphenyl-4,4′-dicarboxyl­ate anions within slightly distorted octa­hedra. The Mg cations are located on a center of inversion, the biphenyl-4,4′-dicarboxyl­ate anions around a twofold rotation axis and the water mol­ecule in a general position. The Mg cations are linked by the anions into a three-dimensional framework

    APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

    Full text link
    We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO2 emission, pushing the frontier for green AI that is environmental-friendly. The code and video are publicly available.Comment: Accepted by CVPR 202
    corecore