17 research outputs found

    Nonlinear-Cost Random Walk: exact statistics of the distance covered for fixed budget

    Full text link
    We consider the Nonlinear-Cost Random Walk model in discrete time introduced in [Phys. Rev. Lett. 130, 237102 (2023)], where a fee is charged for each jump of the walker. The nonlinear cost function is such that slow/short jumps incur a flat fee, while for fast/long jumps the cost is proportional to the distance covered. In this paper we compute analytically the average and variance of the distance covered in nn steps when the total budget CC is fixed, as well as the statistics of the number of long/short jumps in a trajectory of length nn, for the exponential jump distribution. These observables exhibit a very rich and non-monotonic scaling behavior as a function of the variable C/nC/n, which is traced back to the makeup of a typical trajectory in terms of long/short jumps, and the resulting "entropy" thereof. As a byproduct, we compute the asymptotic behavior of ratios of Kummer hypergeometric functions when both the first and last arguments are large. All our analytical results are corroborated by numerical simulations.Comment: 31 pages, 8 figure

    HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow Architectures

    Get PDF
    Energy and throughput efficient acceleration of convolutional neural networks (CNN) on devices with a strict power budget is achieved by leveraging different scheduling techniques to minimize data movement and maximize data reuse. Several dataflow mapping frameworks have been developed to explore the optimal scheduling of CNN layers on reconfigurable accelerators. However, previous works usually optimize each layer singularly, without leveraging the data reuse between the layers of CNNs. In this work, we present an analytical model to achieve efficient data reuse by searching for efficient scheduling of communication and computation across layers. We call this inter-layer scheduling framework HW-Flow-Fusion, as we explore the fused map-space of multiple layers sharing the available resources of the same accelerator, investigating the constraints and trade-offs of mapping the execution of multiple workloads with data dependencies. We propose a memory-efficient data reuse model, tiling, and resource partitioning strategies to fuse multiple layers without recomputation. Compared to standard single-layer scheduling, inter-layer scheduling can reduce the communication volume by 51% and 53% for selected VGG16-E and ResNet18 layers on a spatial array accelerator, and reduce the latency by 39% and 34% respectively, while also increasing the computation to communication ratio which improves the memory bandwidth efficiency

    Pruning as a Binarization Technique

    Get PDF
    Convolutional neural networks (CNNs) can be quantized to reduce the bit-width of their weights and activations. Pruning is another compression technique, where entire structures are removed from a CNN’s computation graph. Multi-bit networks (MBNs) encode the operands (weights and activations) of the convolution into multiple binary bases, where the bit-width of the particular operand is equal to its number of binary bases. Therefore, this work views pruning an individual binary base in an MBN as a reduction in the bit-width of its operands, i.e. quantization. Although many binarization methods have improved the accuracy of binary neural networks (BNNs) by e.g. minimizing quantization error, improving training strategies or proposing different network architecture designs, we reveal a new viewpoint to achieve high-accuracy BNNs, which leverages pruning as a binarization technique (PaBT). We exploit gradient information that exposes the importance of each binary convolution and its contribution to the loss. We prune entire binary convolutions, reducing the effective bitwidths of the MBN during the training. This ultimately results in a smooth convergence to accurate BNNs. PaBT achieves 2.9 p.p., 1.6 p.p. and 0.9 p.p. better accuracy than SotA BNNs IR-Net, LNS and SiMaN on the ImageNet dataset, respectively. Further, PaBT scales to the more complex task of semantic segmentation, outperforming ABC-Net on the CityScapes dataset. This positions PaBT as a novel high-accuracy binarization scheme, and makes it the first to expose the potential of latent-weight-free training for compression techniques

    MATAR: Multi-Quantization-Aware Training for Accurate and Fast Hardware Retargeting

    Get PDF
    Quantization of deep neural networks (DNNs) re- duces their memory footprint and simplifies their hardware arith- metic logic, enabling efficient inference on edge devices. Different hardware targets can support different forms of quantization, e.g. full 8-bit, or 8/4/2-bit mixed-precision combinations, or fully- flexible bit-serial solutions. This makes standard quantization- aware training (QAT) of a DNN for different targets challenging, as there needs to be careful consideration of the supported quantization-levels of each target at training time. In this paper, we propose a generalized QAT solution that results in a DNN which can be retargeted to different hardware, without any retraining or prior knowledge of the hardware’s supported quantization policy. First, we present the novel training scheme which makes the model aware of multiple quantization strategies. Then we demonstrate the retargeting capabilities of the resulting DNN by using a genetic algorithm to search for layer-wise, mixed-precision solutions that maximize performance and/or accuracy on the hardware target, without the need of fine-tuning. By making the DNN agnostic of the final hardware target, our method allows DNNs to be distributed to many users on different hardware platforms, without the need for sharing the training loop or dataset of the DNN developers, nor detailing the hardware capabilities ahead of time by the end-users of the efficient quantized solution. Models trained with our approach can generalize on multiple quantization policies with minimal accuracy degradation compared to target- specific quantization counterparts

    Accelerating and pruning CNNs for semantic segmentation on FPGA

    Get PDF
    Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-pipelined hardware accelerator with support for dilated convolution is introduced, which cuts down the redundant zero multiplications. Furthermore, we propose a genetic algorithm based automated channel pruning technique to jointly optimize computational complexity and model accuracy. Finally, hardware heuristics and an accurate model of the custom accelerator design enable a hardware-aware pruning framework. We achieve 2.44X lower latency with minimal degradation in semantic prediction quality (−1.98 pp lower mean intersection over union) compared to the baseline DeepLabV3+ model, evaluated on an Arria-10 FPGA. The binary files of the FPGA design, baseline and pruned models can be found in github.com/pierpaolomori/SemanticSegmentationFPGA

    Wino Vidi Vici: Conquering Numerical Instability of 8-Bit Winograd Convolution for Accurate Inference Acceleration on Edge

    Get PDF
    Winograd-based convolution can reduce the total number of operations needed for convolutional neural network (CNN) inference on edge devices. Most edge hardware accelerators use low-precision, 8-bit integer arithmetic units to improve energy efficiency and latency. This makes CNN quantization a critical step before deploying the model on such an edge device. To extract the benefits of fast Winograd-based convolution and efficient integer quantization, the two approaches must be combined. Research has shown that the transform required to execute convolutions in the Winograd domain results in numerical instability and severe accuracy degradation when combined with quantization, making the two techniques incompatible on edge hardware. This paper proposes a novel training scheme to achieve efficient Winograd-accelerated, quantized CNNs. 8-bit quantization is applied to all the intermediate results of the Winograd convolution without sacrificing task-related accuracy. This is achieved by introducing clipping factors in the intermediate quantization stages as well as using the complex numerical system to improve the transform. We achieve 2.8x and 2.1x reduction in MAC operations on ResNet-20-CIFAR-10 and ResNet-18-ImageNet, respectively, with no accuracy degradation

    Bladder metastases of appendiceal mucinous adenocarcinoma: a case presentation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Appendiceal adenocarcinoma is rare with a frequency of 0.08% of all surgically removed appendices. Few cases of appendiceal carcinoma infiltrating the bladder wall for spatial contiguity have been documented.</p> <p>Case Presentation</p> <p>A case is reported of a 45-years old woman with mucinous cystadenocarcinoma of the appendix with bladder metastasis. Although ultrasonography and voided urinary cytology were negative, abdomen computed tomography (CT) scan and cystoscopy and subsequent pathological examination revealed a mass exclusively located in the anterior wall of the bladder. Histopathology of the transurethral bladder resection revealed a bladder adenocarcinoma [6 cm (at the maximum diameter) × 2,5 cm; approximate weight: 10 gr] with focal mucinous aspects penetrating the muscle and perivisceral fat. Laparotomy evidenced the presence of a solid mass of the appendix (2,5 cm × 3 cm × 2 cm) extending to the loco-regional lymph nodes. Appendectomy and right hemicolectomy, linfoadenectomy and partial cystectomy were performed. The subsequent pathological examination revealed a mucinous cystadenocarcinoma of the appendix with metastatic cells colonising the anterior bladder wall and several colic lymph nodes.</p> <p>Conclusions</p> <p>The rarity of the appendiceal carcinoma invading the urinary bladder and its usual involvement of nearest organs and the posterior bladder wall, led us to describe this case which demonstrates the ability of the appendiceal cancer to metastasize different regions of urinary bladder.</p

    Molecular insights to the bioactive form of BV02, a reference inhibitor of 14-3-3σ protein-protein interactions

    Full text link
    BV02 is a reference inhibitor of 14-3-3 protein-protein interactions, which is currently used as chemical biology tool to understand the role of 14-3-3 proteins in pathological contexts. Due to chemical instability in certain conditions, its bioactive form has remained unclear. Here, we use NMR spectroscopy to prove for the first time the direct interaction between the molecule and 14-3-3σ, and to depict its bioactive form, namely the phthalimide derivative 9. Our work provides molecular insights to the bioactive form of the 14-3-3 PPI inhibitor and facilitates further development as candidate therapeutic agent
    corecore