35 research outputs found

    Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models

    Full text link
    Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced the comprehension of multimedia content, bringing together diverse modalities such as text, images, and videos. However, a critical challenge faced by these models, especially when processing video inputs, is the occurrence of hallucinations - erroneous perceptions or interpretations, particularly at the event level. This study introduces an innovative method to address event-level hallucinations in MLLMs, focusing on specific temporal understanding in video content. Our approach leverages a novel framework that extracts and utilizes event-specific information from both the event query and the provided video to refine MLLMs' response. We propose a unique mechanism that decomposes on-demand event queries into iconic actions. Subsequently, we employ models like CLIP and BLIP2 to predict specific timestamps for event occurrences. Our evaluation, conducted using the Charades-STA dataset, demonstrates a significant reduction in temporal hallucinations and an improvement in the quality of event-related responses. This research not only provides a new perspective in addressing a critical limitation of MLLMs but also contributes a quantitatively measurable method for evaluating MLLMs in the context of temporal-related questions.Comment: 7 pages, 7 figure

    Synthesis of titanium nitride for self-aligned gate AlGaN/GaN heterostructure field-effect transistors

    Get PDF
    In this study, titanium nitride (TiN) is synthesized using reactive sputtering for a self-aligned gate process. The Schottky barrier height of the TiN on n-GaN is around 0.5 to 0.6 eV and remains virtually constant with varying nitrogen ratios. As compared with the conventional Ni electrode, the TiN electrode presents a lower turn-on voltage, while its reverse leakage current is comparable with that of Ni. The results of annealing evaluation at different temperatures and duration times show that the TiN/W/Au gate stack can withstand the ohmic annealing process at 800°C for 1 or 3 min. Finally, the self-aligned TiN-gated AlGaN/GaN heterostructure field-effect transistors are obtained with good pinch-off characteristics

    Highly exposed {001} facets of titanium dioxide modified with reduced graphene oxide for dopamine sensing

    Get PDF
    Titanium dioxide (TiO2) with highly exposed {001} facets was synthesized through a facile solvo-thermal method and its surface was decorated by using reduced graphene oxide (rGO) sheets. The morphology and chemical composition of the prepared rGO/TiO2 {001} nanocomposite were examined by using suitable characterization techniques. The rGO/TiO2 {001} nanocomposite was used to modify glassy carbon electrode (GCE), which showed higher electrocatalytic activity towards the oxidation of dopamine (DA) and ascorbic acid (AA), when compared to unmodified GCE. The differential pulse voltammetric studies revealed good sensitivity and selectivity nature of the rGO/TiO2 {001} nanocomposite modified GCE for the detection of DA in the presence of AA. The modified GCE exhibited a low electrochemical detection limit of 6 μM over the linear range of 2–60 μM. Overall, this work provides a simple platform for the development of GCE modified with rGO/TiO2 {001} nanocomposite with highly exposed {001} facets for potential electrochemical sensing applications

    Text Line Extraction in Natural Scene Images

    No full text
    主1-参1システム情報_情報知能工

    Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities

    No full text
    Automatic video understanding is a crucial piece of technology which promotes urban sustainability. Video grounding is a fundamental component of video understanding that has been evolving quickly in recent years, but its use is restricted due to the high labeling costs and typical performance limitations imposed by the pre-defined training dataset. In this paper, a novel atom-based zero-shot video grounding (AZVG) method is proposed to retrieve the segments in the video that correspond to a given input sentence. Although it is training-free, the performance of AZVG is competitive to the weakly supervised methods and better than unsupervised SOTA methods on the Charades-STA dataset. The method can support flexible queries as well as different video content. It can play an important role in a wider range of urban living applications

    Zero‐shot temporal event localisation: Label‐free, training‐free, domain‐free

    No full text
    Abstract Temporal event localisation (TEL) has recently attracted increasing attention due to the rapid development of video platforms. Existing methods are based on either fully/weakly supervised or unsupervised learning, and thus they rely on expensive data annotation and time‐consuming training. Moreover, these models, which are trained on specific domain data, limit the model generalisation to data distribution shifts. To cope with these difficulties, the authors propose a zero‐shot TEL method that can operate without training data or annotations. Leveraging large‐scale vision and language pre‐trained models, for example, CLIP, we solve the two key problems: (1) how to find the relevant region where the event is likely to occur; (2) how to determine event duration after we find the relevant region. Query guided optimisation for local frame relevance relying on the query‐to‐frame relationship is proposed to find the most relevant frame region where the event is most likely to occur. Proposal generation method relying on the frame‐to‐frame relationship is proposed to determine the event duration. The authors also propose a greedy event sampling strategy to predict multiple durations with high reliability for the given event. The authors’ methodology is unique, offering a label‐free, training‐free, and domain‐free approach. It enables the application of TEL purely at the testing stage. The practical results show it achieves competitive performance on the standard Charades‐STA and ActivityCaptions datasets

    Understanding Physicians’ Online-Offline Behavior Dynamics: An Empirical Study

    No full text
    Physicians’ participation in online healthcare platforms serves to integrate online healthcare resources with the offline medical system. This integration brings opportunities for reshaping healthcare delivery systems. In the field of telemedicine, there has been an extensive discussion about physician participation, but little is known about how physicians actually participate in online healthcare platforms and offline medical systems. Understanding physicians’ participation dynamics between online and offline channels is of great importance to academic researchers, practitioners, and policymakers. Such an understanding can reveal insights into how healthcare is actually delivered to patients through both channels, how to contribute to quantifying the social impacts of online healthcare services, and how to improve healthcare delivery systems. Thus, in this study, we investigate physicians’ online-offline behavior dynamics using data from both online and offline channels to conduct our analysis. As physicians’ online and offline activities are highly endogenous, we deploy a time-series technique and develop a structural vector autoregression (SVAR) model to examine the behavior dynamics. We find that physicians’ online activities can lead to a higher service quantity in offline channels, whereas offline activities may reduce physicians’ online services due to resource constraints. Our results also show that the more offline patients physicians serve, the more articles the physicians will likely share online. These findings are robust to various econometric specifications and estimation methods. Our research advocates for the benefits Health 2.0 produces and provides evidence of the value of online healthcare communities and the policies that support them

    A Through-Hole Lead Connection Method for Thin-Film Thermocouples on Turbine Blades

    No full text
    To solve the current problems with thin-film thermocouple signals on turbine blades in ultra-high temperature environments, this study explores the use of a through-hole lead connection technology for high-temperature resistant nickel alloys. The technique includes through-hole processing, insulation layer preparation, and filling and fixing of a high-temperature resistant conductive paste. The through-hole lead connection preparation process was optimized by investigating the influence of the inner diameter of the through-hole, solder volume, and temperature treatment on the contact strength and surface roughness of the thin-film for contact resistance. Finally, the technology was combined with a thin-film thermocouple to perform multiple thermal cycling experiments on the surface of the turbine blade at a temperature of 1000 °C. The results show that the through-hole lead connection technology can achieve a stable output of the thin-film thermocouple signal on the turbine blade
    corecore