21 research outputs found

    Scene Consistency Representation Learning for Video Scene Segmentation

    Full text link
    A long-term video, such as a movie or TV show, is composed of various scenes, each of which represents a series of shots sharing the same semantic story. Spotting the correct scene boundary from the long-term video is a challenging task, since a model must understand the storyline of the video to figure out where a scene starts and ends. To this end, we propose an effective Self-Supervised Learning (SSL) framework to learn better shot representations from unlabeled long-term videos. More specifically, we present an SSL scheme to achieve scene consistency, while exploring considerable data augmentation and shuffling methods to boost the model generalizability. Instead of explicitly learning the scene boundary features as in the previous methods, we introduce a vanilla temporal model with less inductive bias to verify the quality of the shot features. Our method achieves the state-of-the-art performance on the task of Video Scene Segmentation. Additionally, we suggest a more fair and reasonable benchmark to evaluate the performance of Video Scene Segmentation methods. The code is made available.Comment: Accepted to CVPR 202

    Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

    Full text link
    In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents. While Large Language Models (LLMs) have been widely used due to their advanced reasoning skills and vast world knowledge, MLLMs like GPT4-Vision offer enhanced visual understanding and reasoning capabilities. We investigate whether state-of-the-art MLLMs can handle embodied decision-making in an end-to-end manner and whether collaborations between LLMs and MLLMs can enhance decision-making. To address these questions, we introduce a new benchmark called PCA-EVAL, which evaluates embodied decision-making from the perspectives of Perception, Cognition, and Action. Additionally, we propose HOLMES, a multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs to gather multimodal information for informed decision-making. We compare end-to-end embodied decision-making and HOLMES on our benchmark and find that the GPT4-Vision model demonstrates strong end-to-end embodied decision-making abilities, outperforming GPT4-HOLMES in terms of average decision accuracy (+3%). However, this performance is exclusive to the latest GPT4-Vision model, surpassing the open-source state-of-the-art MLLM by 26%. Our results indicate that powerful MLLMs like GPT4-Vision hold promise for decision-making in embodied agents, offering new avenues for MLLM research. Code and data are open at https://github.com/pkunlp-icler/PCA-EVAL/.Comment: FMDM@NeurIPS2023, Code and data: https://github.com/pkunlp-icler/PCA-EVAL

    Chalcogenide Glass-on-Graphene Photonics

    Get PDF
    Two-dimensional (2-D) materials are of tremendous interest to integrated photonics given their singular optical characteristics spanning light emission, modulation, saturable absorption, and nonlinear optics. To harness their optical properties, these atomically thin materials are usually attached onto prefabricated devices via a transfer process. In this paper, we present a new route for 2-D material integration with planar photonics. Central to this approach is the use of chalcogenide glass, a multifunctional material which can be directly deposited and patterned on a wide variety of 2-D materials and can simultaneously function as the light guiding medium, a gate dielectric, and a passivation layer for 2-D materials. Besides claiming improved fabrication yield and throughput compared to the traditional transfer process, our technique also enables unconventional multilayer device geometries optimally designed for enhancing light-matter interactions in the 2-D layers. Capitalizing on this facile integration method, we demonstrate a series of high-performance glass-on-graphene devices including ultra-broadband on-chip polarizers, energy-efficient thermo-optic switches, as well as graphene-based mid-infrared (mid-IR) waveguide-integrated photodetectors and modulators

    Pulsation behavior of a bubble generated by a deep underwater explosion

    No full text
    This paper reports on experiments involving deep underwater explosion (UNDEX) that were conducted in a pressure container. The bubble pulsation behavior due to the deep UNDEX is recorded by a high-speed camera for equivalent depths up to 350 m. The bubble images show that although the shape of the explosive package affects the bubble shape at the initial moment, the bubble easily becomes spherical in shallow water which is 0.8m and 100m depth, but never becomes spherical during the whole first pulsation in deep water which is 200m, 300m and 350m in this paper. Solutions of the Rayleigh–Plesset equation fit well with the experimental data, and the value of the polytropic index γ of the gaseous detonation products changes from 1.25 to 1.3 as the depth is increased. Finally, empirical laws governing the pulsation of a deep-UNDEX bubble are established. The experimental pulsation period and that from the Rayleigh–Plesset equation agree with that obtained empirically, but the maximum radius is smaller than the empirical one. This phenomenon shows that the water depth not only creates a high hydrostatic pressure for the bubble but also changes the energy-release process of a deep UNDEX

    Highly Sensitive Paper-Based Force Sensors with Natural Micro-Nanostructure Sensitive Element

    No full text
    Flexible paper-based force sensors have garnered significant attention for their important potential applications in healthcare wearables, portable electronics, etc. However, most studies have only used paper as the flexible substrate for sensors, not fully exploiting the potential of paper’s micro-nanostructure for sensing. This article proposes a novel approach where paper serves both as the sensitive element and the flexible substrate of force sensors. Under external mechanical forces, the micro-nanostructure of the conductive-treated paper will change, leading to significant changes in the related electrical output and thus enabling sensing. To demonstrate the feasibility and universality of this new method, the article takes paper-based capacitive pressure sensors and paper-based resistive strain sensors as examples, detailing their fabrication processes, constructing sensing principle models based on the micro-nanostructure of paper materials, and testing their main sensing performance. For the capacitive paper-based pressure sensor, it achieves a high sensitivity of 1.623 kPa−1, a fast response time of 240 ms, and a minimum pressure resolution of 4.1 Pa. As for the resistive paper-based strain sensor, it achieves a high sensitivity of 72 and a fast response time of 300 ms. The proposed new method offers advantages such as high sensitivity, simplicity in the fabrication process, environmental friendliness, and cost-effectiveness, providing new insights into the research of flexible force sensors

    Improved Synthesis of a Novel Biodegradable Tunable Micellar Polymer Based on Partially Hydrogenated Poly(β-malic Acid-co-benzyl Malate)

    No full text
    Poly(benzyl malate) (PBM), together with its derivatives, have been studied as nanocarriers for biomedical applications due to their superior biocompatibility and biodegradability. The acquisition of PBM is primarily from chemical routes, which could offer polymer-controlled molecular weight and a unique controllable morphology. Nowadays, the frequently used synthesis from L-aspartic acid gives an overall yield of 4.5%. In this work, a novel synthesis route with malic acid as the initiator was successfully designed and optimized, increasing the reaction yield up to 31.2%. Furthermore, a crystalline form of PBM (PBM-2) that polymerized from high optical purity benzyl-β-malolactonate (MLABn) was discovered during the optimization process. X-ray diffraction (XRD) patterns revealed that the crystalline PBM-2 had obvious diffraction peaks, demonstrating that its internal atoms were arranged in a more orderly manner and were different from the amorphous PBM-1 prepared from the racemic MLABn. The differential scanning calorimetry (DSC) curves and thermogravimetric curves elucidated the diverse thermal behaviors between PBM-1 and PBM-2. The degradation curves and scanning electron microscopy (SEM) images further demonstrated the biodegradability of PBM, which have different crystal structures. The hardness of PBM-2 implied the potential application in bone regeneration, while it resulted in the reduction of solubility when compared with PBM-1, which made it difficult to be dissolved and hydrogenated. The solution was therefore heated up to 75 °C to achieve benzyl deprotection, and a series of partially hydrogenated PBM was sequent prepared. Their optimal hydrogenation rates were screened to determine the optimal conditions for the formation of micelles suitable for drug-carrier applications. In summary, the synthesis route from malic acid facilitated the production of PBM for a shorter time and with a higher yield. The biodegradability, biosafety, mechanical properties, and adjustable hydrogenation widen the application of PBM with tunable properties as drug carriers

    Learning-driven service caching in MEC networks with bursty data traffic and uncertain delays

    No full text
    Mobile edge computing (MEC) provides extremely low-latency services for mobile users, by attaching computing resources to 5G base stations in an MEC network. Network service providers can cache their services from remote data centers to base stations to serve mobile users within their proximity, thereby reducing service latencies. However, mobile users of network services usually have bursty requests that require immediate processing in the MEC network. The data traffic of such requests is bursty, since mobile users have various hidden features including locations, group tags, and mobility patterns. Furthermore, such bursty data traffic causes uncertain congestion at base stations and thus leads to uncertain processing delays. Considering the limited resources of base stations, network services may not be able to be placed into base stations permanently to handle the bursty data traffic. As such, network services need to be dynamically cached in the MEC network to fully address the bursty data traffic and uncertain processing delays. In this paper, we investigate the problem of dynamic service caching and task offloading in an MEC network, by adopting the online learning technique to harness the challenges brought by bursty data traffic and uncertain processing delays. We first propose an online learning algorithm for the problem with uncertain processing delays, by utilizing the multi-armed bandits technique, and analyze the regret bound of the proposed algorithm. We then propose another online learning algorithm for the problem with bursty data traffic and uncertain processing delays, which adaptively learns the bursty data traffic of requests, based on small samples of mobile users’ hidden features. We also propose a novel architecture of Generative Adversarial Network (GAN) to accurately predict user demands using small samples of mobile users’ hidden features. Based on the proposed GAN model, we then devise an efficient heuristic for the problem with the uncertainties of both bursty data traffic and uncertain delays. We finally evaluate the performance of the proposed algorithms by simulations, using a real data trace. Experimental results show that the performance of the proposed algorithms outperforms existing ones by up to 44% in terms of average delay
    corecore