12 research outputs found
Consensus Graph Representation Learning for Better Grounded Image Captioning
The contemporary visual captioning models frequently hallucinate objects that
are not actually in a scene, due to the visual misclassification or
over-reliance on priors that resulting in the semantic inconsistency between
the visual information and the target lexical words. The most common way is to
encourage the captioning model to dynamically link generated object words or
phrases to appropriate regions of the image, i.e., the grounded image
captioning (GIC). However, GIC utilizes an auxiliary task (grounding objects)
that has not solved the key issue of object hallucination, i.e., the semantic
inconsistency. In this paper, we take a novel perspective on the issue above -
exploiting the semantic coherency between the visual and language modalities.
Specifically, we propose the Consensus Rraph Representation Learning framework
(CGRL) for GIC that incorporates a consensus representation into the grounded
captioning pipeline. The consensus is learned by aligning the visual graph
(e.g., scene graph) to the language graph that consider both the nodes and
edges in a graph. With the aligned consensus, the captioning model can capture
both the correct linguistic characteristics and visual relevance, and then
grounding appropriate image regions further. We validate the effectiveness of
our model, with a significant decline in object hallucination (-9% CHAIRi) on
the Flickr30k Entities dataset. Besides, our CGRL also evaluated by several
automatic metrics and human evaluation, the results indicate that the proposed
approach can simultaneously improve the performance of image captioning (+2.9
Cider) and grounding (+2.3 F1LOC).Comment: 9 pages, 5 figures, AAAI 202
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Understanding human emotions is a crucial ability for intelligent robots to
provide better human-robot interactions. The existing works are limited to
trimmed video-level emotion classification, failing to locate the temporal
window corresponding to the emotion. In this paper, we introduce a new task,
named Temporal Emotion Localization in videos~(TEL), which aims to detect human
emotions and localize their corresponding temporal boundaries in untrimmed
videos with aligned subtitles. TEL presents three unique challenges compared to
temporal action localization: 1) The emotions have extremely varied temporal
dynamics; 2) The emotion cues are embedded in both appearances and complex
plots; 3) The fine-grained temporal annotations are complicated and
labor-intensive. To address the first two challenges, we propose a novel
dilated context integrated network with a coarse-fine two-stream architecture.
The coarse stream captures varied temporal dynamics by modeling
multi-granularity temporal contexts. The fine stream achieves complex plots
understanding by reasoning the dependency between the multi-granularity
temporal contexts from the coarse stream and adaptively integrates them into
fine-grained video segment features. To address the third challenge, we
introduce a cross-modal consensus learning paradigm, which leverages the
inherent semantic consensus between the aligned video and subtitle to achieve
weakly-supervised learning. We contribute a new testing set with 3,000
manually-annotated temporal boundaries so that future research on the TEL
problem can be quantitatively evaluated. Extensive experiments show the
effectiveness of our approach on temporal emotion localization. The repository
of this work is at
https://github.com/YYJMJC/Temporal-Emotion-Localization-in-Videos.Comment: Accepted by ACM Multimedia 202
Review on Research Progress of Hydraulic Powered Soft Actuators
Soft actuators have received extensive attention in robotics and smart device applications due to their distinctive dexterity and compliance. Among them, hydraulic soft actuators play an important role in the area because they have much higher specific power and power density than other types such as pneumatic soft actuators. Nevertheless, the deformation of flexible materials in soft actuators brings about inherent hysteresis and nonlinearity, which severely hinders them from producing the desired movement in the presence of advanced control strategies. In this paper, previous research efforts made to enhance the driving capability and actuation efficiency of hydraulic soft actuators are illustrated and analyzed from the three aspects of architecture, materials, and control strategy. Meanwhile, the issues and challenges that have emerged when developing hydraulic soft actuators are discussed. Finally, the potential future development of hydraulic powered soft actuators is discussed
Terahertz photoconductive antenna with all-dielectric nanopillars
Photoconductive antennas (PCAs), as a popular terahertz (THz) radiation source, have been widely used in spectroscopy, material characterization, biological imaging and detection of hazardous materials. However, PCAs have a relatively low energy conversion efficiency from femtosecond laser pulses to THz radiation which often limits the signal-to-noise ratio and bandwidth of THz imaging and spectroscopy systems. To address these limitations, here we report a THz photoconductive antenna emitter with all-dielectric nanopillars integrated on top of the SI-GaAs substrate to increase the generated photocarriers, which achieves a broadband and frequency insensitive THz power enhancement factor around 1.25 at frequencies 0.05 - 1.6 THz. Our results reported here provide a new method for increasing the THz power of PCAs, which paves the way for the subsequent researches of next-generation PCAs
Commercial dishes with gelatin-free microstructured inserts for elongated stem cell self-renewal and pluripotency
Summary: Here, we report the scalable fabrication of 2i-functionalized micro-pyramid-array (μPyA/+2i) inserts for use in commercial multi-well plates, as the alternative cultivation platform for maintaining long-term self-renewal and pluripotency of multiple mESCs and mouse induced pluripotent stem cells. Relevant evidence including cell morphology characterization increased alkaline phosphatase activity, high expression of mESC self-renewal markers, decreased levels of differentiation-associated markers, and high proportion of self-renewal marker cells are provided. Further studies demonstrated that μPyA/+2i could cause a higher cell density in mESC colony, and induce gene expression changes. Subsequent studies showed that μPyA/+2i can influence the cytoskeleton and promote cell adhesion through Cldn-7 upregulation. In summary, these μPyA/+2i inserts offer flexible and gelatin-free micro-envriomnets to maintain long-term self-renewal and pluripotency of mESCs. Enabled by the microstructured inserst, the facile stem cell manipulation and transfer among culture dishes will broaden stem cells both in routine and translational applications
Application of Artificial Intelligence Methods for Imaging of Spinal Metastasis
10.3390/cancers14164025CANCERS141
Deep Learning Model for Classifying Metastatic Epidural Spinal Cord Compression on MRI
10.3389/fonc.2022.849447FRONTIERS IN ONCOLOGY1
Diagnostic Accuracy of CT for Metastatic Epidural Spinal Cord Compression
10.3390/cancers14174231CANCERS141