821 research outputs found

    Depth Assisted Full Resolution Network for Single Image-based View Synthesis

    Full text link
    Researches in novel viewpoint synthesis majorly focus on interpolation from multi-view input images. In this paper, we focus on a more challenging and ill-posed problem that is to synthesize novel viewpoints from one single input image. To achieve this goal, we propose a novel deep learning-based technique. We design a full resolution network that extracts local image features with the same resolution of the input, which contributes to derive high resolution and prevent blurry artifacts in the final synthesized images. We also involve a pre-trained depth estimation network into our system, and thus 3D information is able to be utilized to infer the flow field between the input and the target image. Since the depth network is trained by depth order information between arbitrary pairs of points in the scene, global image features are also involved into our system. Finally, a synthesis layer is used to not only warp the observed pixels to the desired positions but also hallucinate the missing pixels with recorded pixels. Experiments show that our technique performs well on images of various scenes, and outperforms the state-of-the-art techniques

    Language Models for Image Captioning: The Quirks and What Works

    Full text link
    Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.Comment: See http://research.microsoft.com/en-us/projects/image_captioning for project informatio

    Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks

    Full text link
    Although remarkable progress has been achieved in preventing large language model (LLM) hallucinations using instruction tuning and retrieval augmentation, it remains challenging to measure the reliability of LLMs using human-crafted evaluation data which is not available for many tasks and domains and could suffer from data leakage. Inspired by adversarial machine learning, this paper aims to develop a method of automatically generating evaluation data by appropriately modifying existing data on which LLMs behave faithfully. Specifically, this paper presents AutoDebug, an LLM-based framework to use prompting chaining to generate transferable adversarial attacks in the form of question-answering examples. We seek to understand the extent to which these examples trigger the hallucination behaviors of LLMs. We implement AutoDebug using ChatGPT and evaluate the resulting two variants of a popular open-domain question-answering dataset, Natural Questions (NQ), on a collection of open-source and proprietary LLMs under various prompting settings. Our generated evaluation data is human-readable and, as we show, humans can answer these modified questions well. Nevertheless, we observe pronounced accuracy drops across multiple LLMs including GPT-4. Our experimental results show that LLMs are likely to hallucinate in two categories of question-answering scenarios where (1) there are conflicts between knowledge given in the prompt and their parametric knowledge, or (2) the knowledge expressed in the prompt is complex. Finally, we find that the adversarial examples generated by our method are transferable across all considered LLMs. The examples generated by a small model can be used to debug a much larger model, making our approach cost-effective

    A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models with Positional Embeddings

    Full text link
    The recognition of abstracts is crucial for effectively locating the content and clarifying the article. Existing move recognition algorithms lack the ability to learn word position information to obtain contextual semantics. This paper proposes a novel enhanced move recognition algorithm with an improved pre-trained model and a gated network with attention mechanism for unstructured abstracts of Chinese scientific and technological papers. The proposed algorithm first performs summary data segmentation and vocabulary training. The EP-ERNIE_\_AT-GRU framework is leveraged to incorporate word positional information, facilitating deep semantic learning and targeted feature extraction. Experimental results demonstrate that the proposed algorithm achieves 13.37%\% higher accuracy on the split dataset than on the original dataset and a 7.55%\% improvement in accuracy over the basic comparison model

    Soil Liquid Limit and Plastic Limit Treating System Based on Analytic Method

    Get PDF
    AbstractAccording to two present China national standards, a software as Soil Liquid Limit and Plastic Limit Data Treating System, with analytic method, was developed using object-oriented visual programming tool. The analytic method used in the developed system was different to traditional method of treating soil liquid limit and plastic limit data. N-S algorithm flowchart demonstrated that switch statement and condition statement were taken as main algorithm and second level select nested structure was taken as main frame for the developed system. Three kinds of soil specimens were tested with liquid and plastic limit combined test and the test data was treated with graphic method, Excel software and Soil Liquid Limit and Plastic Limit Data Treating System. The comparative conclusion indicated that Soil Liquid Limit and Plastic Limit Data Treating System improved efficiency and accuracy evidently for treating soil liquid and plastic limit data and had advantages of easy operation and high reliability

    Learning for Semantic Knowledge Base-Guided Online Feature Transmission in Dynamic Channels

    Full text link
    With the proliferation of edge computing, efficient AI inference on edge devices has become essential for intelligent applications such as autonomous vehicles and VR/AR. In this context, we address the problem of efficient remote object recognition by optimizing feature transmission between mobile devices and edge servers. We propose an online optimization framework to address the challenge of dynamic channel conditions and device mobility in an end-to-end communication system. Our approach builds upon existing methods by leveraging a semantic knowledge base to drive multi-level feature transmission, accounting for temporal factors and dynamic elements throughout the transmission process. To solve the online optimization problem, we design a novel soft actor-critic-based deep reinforcement learning system with a carefully designed reward function for real-time decision-making, overcoming the optimization difficulty of the NP-hard problem and achieving the minimization of semantic loss while respecting latency constraints. Numerical results showcase the superiority of our approach compared to traditional greedy methods under various system setups.Comment: 6 page
    corecore