12 research outputs found

    Consensus Graph Representation Learning for Better Grounded Image Captioning

    Full text link
    The contemporary visual captioning models frequently hallucinate objects that are not actually in a scene, due to the visual misclassification or over-reliance on priors that resulting in the semantic inconsistency between the visual information and the target lexical words. The most common way is to encourage the captioning model to dynamically link generated object words or phrases to appropriate regions of the image, i.e., the grounded image captioning (GIC). However, GIC utilizes an auxiliary task (grounding objects) that has not solved the key issue of object hallucination, i.e., the semantic inconsistency. In this paper, we take a novel perspective on the issue above - exploiting the semantic coherency between the visual and language modalities. Specifically, we propose the Consensus Rraph Representation Learning framework (CGRL) for GIC that incorporates a consensus representation into the grounded captioning pipeline. The consensus is learned by aligning the visual graph (e.g., scene graph) to the language graph that consider both the nodes and edges in a graph. With the aligned consensus, the captioning model can capture both the correct linguistic characteristics and visual relevance, and then grounding appropriate image regions further. We validate the effectiveness of our model, with a significant decline in object hallucination (-9% CHAIRi) on the Flickr30k Entities dataset. Besides, our CGRL also evaluated by several automatic metrics and human evaluation, the results indicate that the proposed approach can simultaneously improve the performance of image captioning (+2.9 Cider) and grounding (+2.3 F1LOC).Comment: 9 pages, 5 figures, AAAI 202

    Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

    Full text link
    Understanding human emotions is a crucial ability for intelligent robots to provide better human-robot interactions. The existing works are limited to trimmed video-level emotion classification, failing to locate the temporal window corresponding to the emotion. In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles. TEL presents three unique challenges compared to temporal action localization: 1) The emotions have extremely varied temporal dynamics; 2) The emotion cues are embedded in both appearances and complex plots; 3) The fine-grained temporal annotations are complicated and labor-intensive. To address the first two challenges, we propose a novel dilated context integrated network with a coarse-fine two-stream architecture. The coarse stream captures varied temporal dynamics by modeling multi-granularity temporal contexts. The fine stream achieves complex plots understanding by reasoning the dependency between the multi-granularity temporal contexts from the coarse stream and adaptively integrates them into fine-grained video segment features. To address the third challenge, we introduce a cross-modal consensus learning paradigm, which leverages the inherent semantic consensus between the aligned video and subtitle to achieve weakly-supervised learning. We contribute a new testing set with 3,000 manually-annotated temporal boundaries so that future research on the TEL problem can be quantitatively evaluated. Extensive experiments show the effectiveness of our approach on temporal emotion localization. The repository of this work is at https://github.com/YYJMJC/Temporal-Emotion-Localization-in-Videos.Comment: Accepted by ACM Multimedia 202

    Review on Research Progress of Hydraulic Powered Soft Actuators

    No full text
    Soft actuators have received extensive attention in robotics and smart device applications due to their distinctive dexterity and compliance. Among them, hydraulic soft actuators play an important role in the area because they have much higher specific power and power density than other types such as pneumatic soft actuators. Nevertheless, the deformation of flexible materials in soft actuators brings about inherent hysteresis and nonlinearity, which severely hinders them from producing the desired movement in the presence of advanced control strategies. In this paper, previous research efforts made to enhance the driving capability and actuation efficiency of hydraulic soft actuators are illustrated and analyzed from the three aspects of architecture, materials, and control strategy. Meanwhile, the issues and challenges that have emerged when developing hydraulic soft actuators are discussed. Finally, the potential future development of hydraulic powered soft actuators is discussed

    Terahertz photoconductive antenna with all-dielectric nanopillars

    No full text
    Photoconductive antennas (PCAs), as a popular terahertz (THz) radiation source, have been widely used in spectroscopy, material characterization, biological imaging and detection of hazardous materials. However, PCAs have a relatively low energy conversion efficiency from femtosecond laser pulses to THz radiation which often limits the signal-to-noise ratio and bandwidth of THz imaging and spectroscopy systems. To address these limitations, here we report a THz photoconductive antenna emitter with all-dielectric nanopillars integrated on top of the SI-GaAs substrate to increase the generated photocarriers, which achieves a broadband and frequency insensitive THz power enhancement factor around 1.25 at frequencies 0.05 - 1.6 THz. Our results reported here provide a new method for increasing the THz power of PCAs, which paves the way for the subsequent researches of next-generation PCAs

    Commercial dishes with gelatin-free microstructured inserts for elongated stem cell self-renewal and pluripotency

    No full text
    Summary: Here, we report the scalable fabrication of 2i-functionalized micro-pyramid-array (μPyA/+2i) inserts for use in commercial multi-well plates, as the alternative cultivation platform for maintaining long-term self-renewal and pluripotency of multiple mESCs and mouse induced pluripotent stem cells. Relevant evidence including cell morphology characterization increased alkaline phosphatase activity, high expression of mESC self-renewal markers, decreased levels of differentiation-associated markers, and high proportion of self-renewal marker cells are provided. Further studies demonstrated that μPyA/+2i could cause a higher cell density in mESC colony, and induce gene expression changes. Subsequent studies showed that μPyA/+2i can influence the cytoskeleton and promote cell adhesion through Cldn-7 upregulation. In summary, these μPyA/+2i inserts offer flexible and gelatin-free micro-envriomnets to maintain long-term self-renewal and pluripotency of mESCs. Enabled by the microstructured inserst, the facile stem cell manipulation and transfer among culture dishes will broaden stem cells both in routine and translational applications
    corecore