33 research outputs found

    Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

    Full text link
    Recent advancements in multimodal large language models (MLLMs) have achieved significant multimodal generation capabilities, akin to GPT-4. These models predominantly map visual information into language representation space, leveraging the vast knowledge and powerful text generation abilities of LLMs to produce multimodal instruction-following responses. We could term this method as LLMs for Vision because of its employing LLMs for visual-language understanding, yet observe that these MLLMs neglect the potential of harnessing visual knowledge to enhance overall capabilities of LLMs, which could be regraded as Vision Enhancing LLMs. In this paper, we propose an approach called MKS2, aimed at enhancing LLMs through empowering Multimodal Knowledge Storage and Sharing in LLMs. Specifically, we introduce the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently. Additionally, we present a soft Mixtures-of-Multimodal Experts architecture in LLMs to invoke multimodal knowledge collaboration during generation. Our comprehensive experiments demonstrate that MKS2 substantially augments the reasoning capabilities of LLMs in contexts necessitating physical or commonsense knowledge. It also delivers competitive results on multimodal benchmarks.Comment: 12 pages, 4 figure

    Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints

    Full text link
    Unlike cloud-based deep learning models that are often large and uniform, edge-deployed models usually demand customization for domain-specific tasks and resource-limited environments. Such customization processes can be costly and time-consuming due to the diversity of edge scenarios and the training load for each scenario. Although various approaches have been proposed for rapid resource-oriented customization and task-oriented customization respectively, achieving both of them at the same time is challenging. Drawing inspiration from the generative AI and the modular composability of neural networks, we introduce NN-Factory, an one-for-all framework to generate customized lightweight models for diverse edge scenarios. The key idea is to use a generative model to directly produce the customized models, instead of training them. The main components of NN-Factory include a modular supernet with pretrained modules that can be conditionally activated to accomplish different tasks and a generative module assembler that manipulate the modules according to task and sparsity requirements. Given an edge scenario, NN-Factory can efficiently customize a compact model specialized in the edge task while satisfying the edge resource constraints by searching for the optimal strategy to assemble the modules. Based on experiments on image classification and object detection tasks with different edge devices, NN-Factory is able to generate high-quality task- and resource-specific models within few seconds, faster than conventional model customization approaches by orders of magnitude

    LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup

    Full text link
    On-device Deep Neural Network (DNN) inference consumes significant computing resources and development efforts. To alleviate that, we propose LUT-NN, the first system to empower inference by table lookup, to reduce inference cost. LUT-NN learns the typical features for each operator, named centroid, and precompute the results for these centroids to save in lookup tables. During inference, the results of the closest centroids with the inputs can be read directly from the table, as the approximated outputs without computations. LUT-NN integrates two major novel techniques: (1) differentiable centroid learning through backpropagation, which adapts three levels of approximation to minimize the accuracy impact by centroids; (2) table lookup inference execution, which comprehensively considers different levels of parallelism, memory access reduction, and dedicated hardware units for optimal performance. LUT-NN is evaluated on multiple real tasks, covering image and speech recognition, and nature language processing. Compared to related work, LUT-NN improves accuracy by 66% to 92%, achieving similar level with the original models. LUT-NN reduces the cost at all dimensions, including FLOPs (≤\leq 16x), model size (≤\leq 7x), latency (≤\leq 6.8x), memory (≤\leq 6.5x), and power (≤\leq 41.7%)

    Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

    Full text link
    Web applications are increasingly becoming the primary platform for AI service delivery, making in-browser deep learning (DL) inference more prominent. However, current in-browser inference systems fail to effectively utilize advanced web programming techniques and customize kernels for various client devices, leading to suboptimal performance. To address the issues, this paper presents the first in-browser inference system, nn-JIT.web, which enables just-in-time (JIT) auto-generation of optimized kernels for both CPUs and GPUs during inference. The system achieves this by using two novel web programming techniques that can significantly reduce kernel generation time, compared to other tensor compilers such as TVM, while maintaining or even improving performance. The first technique, Tensor-Web Compiling Co-Design, lowers compiling costs by unifying tensor and web compiling and eliminating redundant and ineffective compiling passes. The second technique, Web-Specific Lite Kernel Optimization Space Design, reduces kernel tuning costs by focusing on web programming requirements and efficient hardware resource utilization, limiting the optimization space to only dozens. nn-JIT.web is evaluated for modern transformer models on a range of client devices, including the mainstream CPUs and GPUs from ARM, Intel, AMD and Nvidia. Results show that nn-JIT.web can achieve up to 8.2x faster within 30 seconds compared to the baselines across various models

    Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

    Full text link
    An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN). Many novel and sophisticated activation functions have been proposed to improve the DNN accuracy but also consume massive memory in the training process with back-propagation. In this study, we propose the nested forward automatic differentiation (Forward-AD), specifically for the element-wise activation function for memory-efficient DNN training. We deploy nested Forward-AD in two widely-used deep learning frameworks, TensorFlow and PyTorch, which support the static and dynamic computation graph, respectively. Our evaluation shows that nested Forward-AD reduces the memory footprint by up to 1.97x than the baseline model and outperforms the recomputation by 20% under the same memory reduction ratio.Comment: 8 pages, ICCD 202

    Upconversion Luminescence and Magnetic Turning of NaLuF 4

    Get PDF
    Fluorescent and magnetic bifunctional NaLuF4:Yb3+/Tm3+/Gd3+ nanocrystals were synthesized by the solvothermal method and subsequent surface modification. By changing the doping concentration of Gd3+, the shape, size, luminescent properties, and magnetic properties of the nanoparticles can be modulated. These NaLuF4:Yb3+/Tm3+/Gd3+ nanocrystals present efficient blue upconversion fluorescence and excellent paramagnetic property at room temperature. Based on the luminescence resonance energy transfer (LRET), upconversion nanoparticles (UCNPs) were confirmed to be an efficient fluorescent nanoprobe for detecting acriflavine. It is easy to derive the concentration of acriflavine from the Integral Intensity Ratio of Green (emission from acriflavine) to Blue (emission from UCNPs) fluorescent signals. Based on this upconversion fluorescent nanoprobe, the detection limit of acriflavine can reach up to 0.32 μg/mL

    Line identification of extreme ultraviolet spectra from aluminum ions in EAST Tokamak plasmas

    Full text link
    Extreme ultraviolet (EUV) spectra emitted from aluminum in the 5-340 A wavelength range were observed in Experimental Advanced Superconducting Tokamak (EAST) discharges. Several spectral lines from aluminum ions with different degrees of ionization were successfully observed with sufficient spectral intensities and resolutions using three fast-time-response EUV spectrometers. The line identification uses three independent state-of-art computational codes for the atomic structure calculations, which provide the wavelengths and radiative transition probabilities rate coefficients. These programs are HULLAC (Hebrew University - Lawrence Livermore Atomic Code), AUTOSTRUCTURE, and FAC (Flexible Atomic Code). Using three different codes allows us to resolve some ambiguities in identifying certain spectral lines and assess the validity of the theoretical predictions

    Production and characterization of a recombinant single-chain antibody against Hantaan virus envelop glycoprotein

    Get PDF
    Hantaan virus (HTNV) is the type of Hantavirus causing hemorrhagic fever with renal syndrome, for which no specific therapeutics are available so far. Cell type-specific internalizing antibodies can be used to deliver therapeutics intracellularly to target cell and thus, have potential application in anti-HTNV infection. To achieve intracellular delivery of therapeutics, it is necessary to obtain antibodies that demonstrate sufficient cell type-specific binding, internalizing, and desired cellular trafficking. Here, we describe the prokaryotic expression, affinity purification, and functional testing of a single-chain Fv antibody fragment (scFv) against HTNV envelop glycoprotein (GP), an HTNV-specific antigen normally located on the membranes of HTNV-infected cells. This HTNV GP-targeting antibody, scFv3G1, was produced in the cytoplasm of Escherichia coli cells as a soluble protein and was purified by immobilized metal affinity chromatography. The purified scFv possessed a high specific antigen-binding activity to HTNV GP and HTNV-infected Vero E6 cells and could be internalized into HTNV-infected cells probably through the clathrin-dependent endocytosis pathways similar to that observed with transferrin. Our results showed that the E. coli-produced scFv had potential applications in targeted and intracellular delivery of therapeutics against HTNV infections
    corecore