1,395 research outputs found
Understanding Video Scenes through Text: Insights from Text-based Video Question Answering
Researchers have extensively studied the field of vision and language,
discovering that both visual and textual content is crucial for understanding
scenes effectively. Particularly, comprehending text in videos holds great
significance, requiring both scene text understanding and temporal reasoning.
This paper focuses on exploring two recently introduced datasets, NewsVideoQA
and M4-ViteVQA, which aim to address video question answering based on textual
content. The NewsVideoQA dataset contains question-answer pairs related to the
text in news videos, while M4-ViteVQA comprises question-answer pairs from
diverse categories like vlogging, traveling, and shopping. We provide an
analysis of the formulation of these datasets on various levels, exploring the
degree of visual understanding and multi-frame comprehension required for
answering the questions. Additionally, the study includes experimentation with
BERT-QA, a text-only model, which demonstrates comparable performance to the
original methods on both datasets, indicating the shortcomings in the
formulation of these datasets. Furthermore, we also look into the domain
adaptation aspect by examining the effectiveness of training on M4-ViteVQA and
evaluating on NewsVideoQA and vice-versa, thereby shedding light on the
challenges and potential benefits of out-of-domain training
Successful Treatment of Postoperative External Biliary Fistula by Selective Nasobiliary Drainage
A 25-year old man presented with a high output external biliary fistula after an operation for a giant
hydatid cyst of the liver. Endoscopic sphincterotomy was inadequate to close the fistula. A nasobiliary
tube was selectively inserted into the leaking hepatic duct and bile was continuously aspirated. The
fistula and the residual cavity healed completely. Details of the patients' management using this
alternative technique, are discussed
Reading Between the Lanes: Text VideoQA on the Road
Text and signs around roads provide crucial information for drivers, vital
for safe navigation and situational awareness. Scene text recognition in motion
is a challenging problem, while textual cues typically appear for a short time
span, and early detection at a distance is necessary. Systems that exploit such
information to assist the driver should not only extract and incorporate visual
and textual cues from the video stream but also reason over time. To address
this issue, we introduce RoadTextVQA, a new dataset for the task of video
question answering (VideoQA) in the context of driver assistance. RoadTextVQA
consists of driving videos collected from multiple countries, annotated
with questions, all based on text or road signs present in the driving
videos. We assess the performance of state-of-the-art video question answering
models on our RoadTextVQA dataset, highlighting the significant potential for
improvement in this domain and the usefulness of the dataset in advancing
research on in-vehicle support systems and text-aware multimodal question
answering. The dataset is available at
http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvq
Diffusive behavior for randomly kicked Newtonian particles in a spatially periodic medium
We prove a central limit theorem for the momentum distribution of a particle
undergoing an unbiased spatially periodic random forcing at exponentially
distributed times without friction. The start is a linear Boltzmann equation
for the phase space density, where the average energy of the particle grows
linearly in time. Rescaling time, the momentum converges to a Brownian motion,
and the position is its time-integral showing superdiffusive scaling with time
. The analysis has two parts: (1) to show that the particle spends
most of its time at high energy, where the spatial environment is practically
invisible; (2) to treat the low energy incursions where the motion is dominated
by the deterministic force, with potential drift but where symmetry arguments
cancel the ballistic behavior.Comment: 55 pages. Some typos corrected from previous versio
- …