316 research outputs found

    Video Question Answering via Attribute-Augmented Attention Network Learning

    Full text link
    Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question. However, the existing visual question answering approaches mainly tackle the problem of static image question, which may be ineffectively for video question answering due to the insufficiency of modeling the temporal dynamics of video contents. In this paper, we study the problem of video question answering by modeling its temporal dynamics with frame-level attention mechanism. We propose the attribute-augmented attention network learning framework that enables the joint frame-level attribute detection and unified video representation learning for video question answering. We then incorporate the multi-step reasoning process for our proposed attention network to further improve the performance. We construct a large-scale video question answering dataset. We conduct the experiments on both multiple-choice and open-ended video question answering tasks to show the effectiveness of the proposed method.Comment: Accepted for SIGIR 201

    Towards a more retail-friendly airport design: a two-step approach

    Full text link
    In recent years, the source of airport revenue has significantly changed. Accordingly, many airports have adjusted their strategies and focused on increasing retail revenue to improve financial sustainability. However, the literature review in this thesis identified two knowledge gaps: (1) empirical analyses on the effects of airport terminal design on retail revenue, and (2) application of general consumer shopping behaviour models to airport retail development. A two-step approach was developed. First, passenger shopping behaviour models were constructed based on two datasets collected at a case study airport: (1) eye-tracking data identified four types of passenger shopping behaviour—completely planned shoppers, partially planned shoppers, unplanned shoppers, and non-shoppers; (2) passenger questionnaire/interview data provided demographic and travel-related data to construct behaviour models. Second, the validity of the behaviour models was tested through an agent-based simulation model (ABSM) against the collected data. Next, the ABSM was used to examine the combined effects of passenger-related factors and terminal-related factors on retail revenue using five scenario studies. The results of the two-step approach revealed several significant findings. First, the passenger mix significantly affects retail revenue. Second, retail revenue could increase by 30% if passengers’ ‘visual distance’ was increased. Although passengers have limitations in their physical visual distance, it could be increased by providing information on retail offerings to passengers (e.g. interactive floor maps, mobile apps to provide retail information). Third, a 1% increase in dwell time could result in a 1.06% increase in retail revenue. Fourth, a sub-optimal terminal layout design could lead to a USD 57 million loss in potential annual retail revenue. Finally, adopting a centralised terminal layout could lead to a 7% increase in retail revenue. This thesis highlights the potential economic benefits of a well-designed terminal with a retail focus. In addition, this thesis demonstrates the feasibility and the potential of the proposed two-step approach in improving the existing retail configuration within airport terminals while maintaining the aeronautical functions. In conclusion, future terminal design guideline could be improved by adopting the two-step approach in designing a more retail-friendly terminal, which will contribute to the financial sustainability of the airport business

    Efficient inference for fully-connected CRFs with stationarity

    Full text link
    The Conditional Random Field (CRF) is a popular tool for object-based image segmentation. CRFs used in prac-tice typically have edges only between adjacent image pix-els. To represent object relationship statistics beyond adja-cent pixels, prior work either represents only weak spatial information using the segmented regions, or encodes only global object co-occurrences. In this paper, we propose a unified model that augments the pixel-wise CRFs to cap-ture object spatial relationships. To this end, we use a fully connected CRF, which has an edge for each pair of pixels. The edge potentials are defined to capture the spatial in-formation and preserve the object boundaries at the same time. Traditional inference methods, such as belief propa-gation and graph cuts, are impractical in such a case where billions of edges are defined. Under only one assumption that the spatial relationships among different objects only depend on their relative positions (spatially stationary), we develop an efficient inference algorithm that converges in a few seconds on a standard resolution image, where belief propagation takes more than one hour for a single iteration. 1

    To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now

    Full text link
    The recent advances in diffusion models (DMs) have revolutionized the generation of complex and diverse images. However, these models also introduce potential safety hazards, such as the production of harmful content and infringement of data copyrights. Although there have been efforts to create safety-driven unlearning methods to counteract these challenges, doubts remain about their capabilities. To bridge this uncertainty, we propose an evaluation framework built upon adversarial attacks (also referred to as adversarial prompts), in order to discern the trustworthiness of these safety-driven unlearned DMs. Specifically, our research explores the (worst-case) robustness of unlearned DMs in eradicating unwanted concepts, styles, and objects, assessed by the generation of adversarial prompts. We develop a novel adversarial learning approach called UnlearnDiff that leverages the inherent classification capabilities of DMs to streamline the generation of adversarial prompts, making it as simple for DMs as it is for image classification attacks. This technique streamlines the creation of adversarial prompts, making the process as intuitive for generative modeling as it is for image classification assaults. Through comprehensive benchmarking, we assess the unlearning robustness of five prevalent unlearned DMs across multiple tasks. Our results underscore the effectiveness and efficiency of UnlearnDiff when compared to state-of-the-art adversarial prompting methods. Codes are available at https://github.com/OPTML-Group/Diffusion-MU-Attack. WARNING: This paper contains model outputs that may be offensive in nature.Comment: Codes are available at https://github.com/OPTML-Group/Diffusion-MU-Attac

    A Meta-analysis of Major Complications between Traditional Pacemakers and Leadless Pacemakers

    Get PDF
    Objectives: We aim to compare the major complications between leadless pacemakers and traditional pacemakers. Background: Leadless pacemakers, which are increasingly used in clinical practice, have several advantages compared with traditional pacemakers in avoiding pocket- and lead-related complications. However, the clinical effect of leadless pacemakers remains controversial. Methods: PubMed, Embase, the Cochrane Central Register of Controlled Trials (CENTRAL), the CNKI database, and the Wanfang database were searched from July 2013 to December 2019. Studies comparing leadless pacemakers and traditional pacemakers were included. The primary end point was major complications. The secondary end points were cardiac perforation/pericardial effusion, device revision or extraction, loss of device function, and death. Results: Six studies fulfilled the inclusion criteria. Only four of the six studies reported data on major complications. Leadless pacemakers were associated with a lower incidence of major complications (risk ratio 0.33, 95% confidence interval 0.25–0.44, P<0.00001, I 2 =49%). We extracted data on cardiac perforation/pericardial effusion, device revision or extraction, loss of device function, and death from six studies. Our meta-analysis showed that leadless pacemakers have a higher risk of cardiac perforation or pericardial effusion (risk ratio 4.28, 95% confidence interval 1.66–11.08, P=0.003, I 2 =0%). No statistically significant differences were found for mortality, device revision or extraction, and loss of device function. Conclusion: Compared with traditional pacemakers, leadless pacemakers have a significantly decreased risk of major complications, but have a higher risk of cardiac perforation or pericardial effusion

    Stripe order and spin dynamics in triangular-lattice antiferromagnet KErSe2_{2}: A single-crystal study with a theoretical description

    Full text link
    The rare-earth triangular-lattice chalcogenide is a great platform for exploring both spin liquids and novel magnetic orders with anisotropic spin interactions and magnetic frustrations. Here, we report the thermodynamic and neutron scattering measurements of rare-earth triangular-lattice chalcogenide KErSe2_{2}, using single-crystal samples. Our experiments revealed a long-range stripe order below 0.2 K. Although the magnetic order was three-dimensional, magnetic excitations exhibited negligible modulation along the z direction, indicating very weak interlayer coupling. Furthermore, magnetic excitation developed a well-defined spin-wave dispersion with a gap of \sim0.03 meV at M points. Both the stripe order and spin-wave excitations could be quantitatively understood from the anisotropic spin interactions of the Er3+^{3+} Kramers doublets

    Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

    Full text link
    Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by identifying and removing redundant training samples without sacrificing performance. In this work, we aim to address the problem of DP for transfer learning, i.e., how to prune a source dataset for improved pretraining efficiency and lossless finetuning accuracy on downstream target tasks. To our best knowledge, the problem of DP for transfer learning remains open, as previous studies have primarily addressed DP and transfer learning as separate problems. By contrast, we establish a unified viewpoint to integrate DP with transfer learning and find that existing DP methods are not suitable for the transfer learning paradigm. We then propose two new DP methods, label mapping and feature mapping, for supervised and self-supervised pretraining settings respectively, by revisiting the DP problem through the lens of source-target domain mapping. Furthermore, we demonstrate the effectiveness of our approach on numerous transfer learning tasks. We show that source data classes can be pruned by up to 40% ~ 80% without sacrificing downstream performance, resulting in a significant 2 ~ 5 times speed-up during the pretraining stage. Besides, our proposal exhibits broad applicability and can improve other computationally intensive transfer learning techniques, such as adversarial pretraining. Codes are available at https://github.com/OPTML-Group/DP4TL.Comment: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023

    On decoder-only architecture for speech-to-text and large language model integration

    Full text link
    Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion

    Quantitative analysis of water vapor budget of a persistent rainstrom event in Tongren of Guizhou Province

    Get PDF
    This paper focus on revealing the features of the water vapor transport, water vapor budget and the contribution percentage of water vapor source regions of the persistent rainstorm in Tongren from July 13th to 16th, 2014 based on precipitation observation data, ERA5 and NCEP GDAS (National Centers for Environmental Prediction, Global Data Assimilation System) reanalysis data and the HYSPLIT4 (Hybrid Single-Particle Lagrangian Integrated Trajectory Model) mode. The results show that: (1) the eastward South Asian High and coupling mechanism of high and low level jets enhanced the dynamic mechanism of convergence at low level and divergence at high level which is conducive to the convergence of water vapor in the target region to condense and then form precipitation. (2) The water vapor over the ocean was continuously transported to the rainstorm area by the water vapor channel built by the synergistic effect of the subtropical high which table controlled the south of Guizhou province, shortwave trough lay the northwest side of subtropical high, and tropical cyclone in the Indian Peninsula at 500 hPa. (3) The air particles in the rainstorm area mainly came from the Arabian Sea, the Bay of Bengal and the South China Sea at a lower height, while a few particles came from the north of Tongren to Eurasia, the Atlantic Ocean at a higher height by 120 h backward trajectory simulation. (4) The contribution percentage of water vapor source regions of the south of Tongren-South China Sea and its nearby islands and water, the east of the Indian Peninsula-the Bay of Bengal, the Arabian Sea-the west of the Indian Peninsula were 48.29%, 32.17 % and 10.47% respectively. In addition, the water vapor the north of Tongren to Eurasia and the Atlantic Ocean also contributed to the rainstorm in Tongren (9.07%). (5) 850 hPa and 700 hPa were the main water vapor contribution levels which provided nearly 3/4 of water vapor to the rainstorm area, the remaining quarter was transported by 500 hPa
    corecore