316 research outputs found
Video Question Answering via Attribute-Augmented Attention Network Learning
Video Question Answering is a challenging problem in visual information
retrieval, which provides the answer to the referenced video content according
to the question. However, the existing visual question answering approaches
mainly tackle the problem of static image question, which may be ineffectively
for video question answering due to the insufficiency of modeling the temporal
dynamics of video contents. In this paper, we study the problem of video
question answering by modeling its temporal dynamics with frame-level attention
mechanism. We propose the attribute-augmented attention network learning
framework that enables the joint frame-level attribute detection and unified
video representation learning for video question answering. We then incorporate
the multi-step reasoning process for our proposed attention network to further
improve the performance. We construct a large-scale video question answering
dataset. We conduct the experiments on both multiple-choice and open-ended
video question answering tasks to show the effectiveness of the proposed
method.Comment: Accepted for SIGIR 201
Towards a more retail-friendly airport design: a two-step approach
In recent years, the source of airport revenue has significantly changed. Accordingly, many airports have adjusted their strategies and focused on increasing retail revenue to improve financial sustainability. However, the literature review in this thesis identified two knowledge gaps: (1) empirical analyses on the effects of airport terminal design on retail revenue, and (2) application of general consumer shopping behaviour models to airport retail development.
A two-step approach was developed. First, passenger shopping behaviour models were constructed based on two datasets collected at a case study airport: (1) eye-tracking data identified four types of passenger shopping behaviour—completely planned shoppers, partially planned shoppers, unplanned shoppers, and non-shoppers; (2) passenger questionnaire/interview data provided demographic and travel-related data to construct behaviour models. Second, the validity of the behaviour models was tested through an agent-based simulation model (ABSM) against the collected data. Next, the ABSM was used to examine the combined effects of passenger-related factors and terminal-related factors on retail revenue using five scenario studies.
The results of the two-step approach revealed several significant findings. First, the passenger mix significantly affects retail revenue. Second, retail revenue could increase by 30% if passengers’ ‘visual distance’ was increased. Although passengers have limitations in their physical visual distance, it could be increased by providing information on retail offerings to passengers (e.g. interactive floor maps, mobile apps to provide retail information). Third, a 1% increase in dwell time could result in a 1.06% increase in retail revenue. Fourth, a sub-optimal terminal layout design could lead to a USD 57 million loss in potential annual retail revenue. Finally, adopting a centralised terminal layout could lead to a 7% increase in retail revenue.
This thesis highlights the potential economic benefits of a well-designed terminal with a retail focus. In addition, this thesis demonstrates the feasibility and the potential of the proposed two-step approach in improving the existing retail configuration within airport terminals while maintaining the aeronautical functions. In conclusion, future terminal design guideline could be improved by adopting the two-step approach in designing a more retail-friendly terminal, which will contribute to the financial sustainability of the airport business
Efficient inference for fully-connected CRFs with stationarity
The Conditional Random Field (CRF) is a popular tool for object-based image segmentation. CRFs used in prac-tice typically have edges only between adjacent image pix-els. To represent object relationship statistics beyond adja-cent pixels, prior work either represents only weak spatial information using the segmented regions, or encodes only global object co-occurrences. In this paper, we propose a unified model that augments the pixel-wise CRFs to cap-ture object spatial relationships. To this end, we use a fully connected CRF, which has an edge for each pair of pixels. The edge potentials are defined to capture the spatial in-formation and preserve the object boundaries at the same time. Traditional inference methods, such as belief propa-gation and graph cuts, are impractical in such a case where billions of edges are defined. Under only one assumption that the spatial relationships among different objects only depend on their relative positions (spatially stationary), we develop an efficient inference algorithm that converges in a few seconds on a standard resolution image, where belief propagation takes more than one hour for a single iteration. 1
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
The recent advances in diffusion models (DMs) have revolutionized the
generation of complex and diverse images. However, these models also introduce
potential safety hazards, such as the production of harmful content and
infringement of data copyrights. Although there have been efforts to create
safety-driven unlearning methods to counteract these challenges, doubts remain
about their capabilities. To bridge this uncertainty, we propose an evaluation
framework built upon adversarial attacks (also referred to as adversarial
prompts), in order to discern the trustworthiness of these safety-driven
unlearned DMs. Specifically, our research explores the (worst-case) robustness
of unlearned DMs in eradicating unwanted concepts, styles, and objects,
assessed by the generation of adversarial prompts. We develop a novel
adversarial learning approach called UnlearnDiff that leverages the inherent
classification capabilities of DMs to streamline the generation of adversarial
prompts, making it as simple for DMs as it is for image classification attacks.
This technique streamlines the creation of adversarial prompts, making the
process as intuitive for generative modeling as it is for image classification
assaults. Through comprehensive benchmarking, we assess the unlearning
robustness of five prevalent unlearned DMs across multiple tasks. Our results
underscore the effectiveness and efficiency of UnlearnDiff when compared to
state-of-the-art adversarial prompting methods. Codes are available at
https://github.com/OPTML-Group/Diffusion-MU-Attack. WARNING: This paper
contains model outputs that may be offensive in nature.Comment: Codes are available at
https://github.com/OPTML-Group/Diffusion-MU-Attac
A Meta-analysis of Major Complications between Traditional Pacemakers and Leadless Pacemakers
Objectives: We aim to compare the major complications between leadless pacemakers and traditional pacemakers. Background: Leadless pacemakers, which are increasingly used in clinical practice, have several advantages compared with traditional pacemakers in avoiding pocket- and lead-related complications. However, the clinical effect of leadless pacemakers remains controversial. Methods: PubMed, Embase, the Cochrane Central Register of Controlled Trials (CENTRAL), the CNKI database, and the Wanfang database were searched from July 2013 to December 2019. Studies comparing leadless pacemakers and traditional pacemakers were included. The primary end point was major complications. The secondary end points were cardiac perforation/pericardial effusion, device revision or extraction, loss of device function, and death. Results: Six studies fulfilled the inclusion criteria. Only four of the six studies reported data on major complications. Leadless pacemakers were associated with a lower incidence of major complications (risk ratio 0.33, 95% confidence interval 0.25–0.44, P<0.00001, I 2 =49%). We extracted data on cardiac perforation/pericardial effusion, device revision or extraction, loss of device function, and death from six studies. Our meta-analysis showed that leadless pacemakers have a higher risk of cardiac perforation or pericardial effusion (risk ratio 4.28, 95% confidence interval 1.66–11.08, P=0.003, I 2 =0%). No statistically significant differences were found for mortality, device revision or extraction, and loss of device function. Conclusion: Compared with traditional pacemakers, leadless pacemakers have a significantly decreased risk of major complications, but have a higher risk of cardiac perforation or pericardial effusion
Stripe order and spin dynamics in triangular-lattice antiferromagnet KErSe: A single-crystal study with a theoretical description
The rare-earth triangular-lattice chalcogenide is a great platform for
exploring both spin liquids and novel magnetic orders with anisotropic spin
interactions and magnetic frustrations. Here, we report the thermodynamic and
neutron scattering measurements of rare-earth triangular-lattice chalcogenide
KErSe, using single-crystal samples. Our experiments revealed a
long-range stripe order below 0.2 K. Although the magnetic order was
three-dimensional, magnetic excitations exhibited negligible modulation along
the z direction, indicating very weak interlayer coupling. Furthermore,
magnetic excitation developed a well-defined spin-wave dispersion with a gap of
0.03 meV at M points. Both the stripe order and spin-wave excitations
could be quantitatively understood from the anisotropic spin interactions of
the Er Kramers doublets
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning
Massive data is often considered essential for deep learning applications,
but it also incurs significant computational and infrastructural costs.
Therefore, dataset pruning (DP) has emerged as an effective way to improve data
efficiency by identifying and removing redundant training samples without
sacrificing performance. In this work, we aim to address the problem of DP for
transfer learning, i.e., how to prune a source dataset for improved pretraining
efficiency and lossless finetuning accuracy on downstream target tasks. To our
best knowledge, the problem of DP for transfer learning remains open, as
previous studies have primarily addressed DP and transfer learning as separate
problems. By contrast, we establish a unified viewpoint to integrate DP with
transfer learning and find that existing DP methods are not suitable for the
transfer learning paradigm. We then propose two new DP methods, label mapping
and feature mapping, for supervised and self-supervised pretraining settings
respectively, by revisiting the DP problem through the lens of source-target
domain mapping. Furthermore, we demonstrate the effectiveness of our approach
on numerous transfer learning tasks. We show that source data classes can be
pruned by up to 40% ~ 80% without sacrificing downstream performance, resulting
in a significant 2 ~ 5 times speed-up during the pretraining stage. Besides,
our proposal exhibits broad applicability and can improve other computationally
intensive transfer learning techniques, such as adversarial pretraining. Codes
are available at https://github.com/OPTML-Group/DP4TL.Comment: Thirty-seventh Conference on Neural Information Processing Systems
(NeurIPS 2023
On decoder-only architecture for speech-to-text and large language model integration
Large language models (LLMs) have achieved remarkable success in the field of
natural language processing, enabling better human-computer interaction using
natural language. However, the seamless integration of speech signals into LLMs
has not been explored well. The "decoder-only" architecture has also not been
well studied for speech processing tasks. In this research, we introduce
Speech-LLaMA, a novel approach that effectively incorporates acoustic
information into text-based large language models. Our method leverages
Connectionist Temporal Classification and a simple audio encoder to map the
compressed acoustic features to the continuous semantic space of the LLM. In
addition, we further probe the decoder-only architecture for speech-to-text
tasks by training a smaller scale randomly initialized speech-LLaMA model from
speech-text paired data alone. We conduct experiments on multilingual
speech-to-text translation tasks and demonstrate a significant improvement over
strong baselines, highlighting the potential advantages of decoder-only models
for speech-to-text conversion
Quantitative analysis of water vapor budget of a persistent rainstrom event in Tongren of Guizhou Province
This paper focus on revealing the features of the water vapor transport, water vapor budget and the contribution percentage of water vapor source regions of the persistent rainstorm in Tongren from July 13th to 16th, 2014 based on precipitation observation data, ERA5 and NCEP GDAS (National Centers for Environmental Prediction, Global Data Assimilation System) reanalysis data and the HYSPLIT4 (Hybrid Single-Particle Lagrangian Integrated Trajectory Model) mode. The results show that: (1) the eastward South Asian High and coupling mechanism of high and low level jets enhanced the dynamic mechanism of convergence at low level and divergence at high level which is conducive to the convergence of water vapor in the target region to condense and then form precipitation. (2) The water vapor over the ocean was continuously transported to the rainstorm area by the water vapor channel built by the synergistic effect of the subtropical high which table controlled the south of Guizhou province, shortwave trough lay the northwest side of subtropical high, and tropical cyclone in the Indian Peninsula at 500 hPa. (3) The air particles in the rainstorm area mainly came from the Arabian Sea, the Bay of Bengal and the South China Sea at a lower height, while a few particles came from the north of Tongren to Eurasia, the Atlantic Ocean at a higher height by 120 h backward trajectory simulation. (4) The contribution percentage of water vapor source regions of the south of Tongren-South China Sea and its nearby islands and water, the east of the Indian Peninsula-the Bay of Bengal, the Arabian Sea-the west of the Indian Peninsula were 48.29%, 32.17 % and 10.47% respectively. In addition, the water vapor the north of Tongren to Eurasia and the Atlantic Ocean also contributed to the rainstorm in Tongren (9.07%). (5) 850 hPa and 700 hPa were the main water vapor contribution levels which provided nearly 3/4 of water vapor to the rainstorm area, the remaining quarter was transported by 500 hPa
- …