35 research outputs found
MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Multi-modal sarcasm detection has attracted much recent attention.
Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder
the development of reliable multi-modal sarcasm detection system: (1) There are
some spurious cues in MMSD, leading to the model bias learning; (2) The
negative samples in MMSD are not always reasonable. To solve the aforementioned
issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings
of MMSD, by removing the spurious cues and re-annotating the unreasonable
samples. Meanwhile, we present a novel framework called multi-view CLIP that is
capable of leveraging multi-grained cues from multiple perspectives (i.e.,
text, image, and text-image interaction view) for multi-modal sarcasm
detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for
building reliable multi-modal sarcasm detection systems and multi-view CLIP can
significantly outperform the previous best baselines.Comment: Accepted by ACL2023 Finding
Pre-training on Synthetic Driving Data for Trajectory Prediction
Accumulating substantial volumes of real-world driving data proves pivotal in
the realm of trajectory forecasting for autonomous driving. Given the heavy
reliance of current trajectory forecasting models on data-driven methodologies,
we aim to tackle the challenge of learning general trajectory forecasting
representations under limited data availability. We propose to augment both HD
maps and trajectories and apply pre-training strategies on top of them.
Specifically, we take advantage of graph representations of HD-map and apply
vector transformations to reshape the maps, to easily enrich the limited number
of scenes. Additionally, we employ a rule-based model to generate trajectories
based on augmented scenes; thus enlarging the trajectories beyond the collected
real ones. To foster the learning of general representations within this
augmented dataset, we comprehensively explore the different pre-training
strategies, including extending the concept of a Masked AutoEncoder (MAE) for
trajectory forecasting. Extensive experiments demonstrate the effectiveness of
our data expansion and pre-training strategies, which outperform the baseline
prediction model by large margins, e.g. 5.04%, 3.84% and 8.30% in terms of
, and
First Steps Toward an Autonomous Accelerator, a Common Project Between DESY and KIT
Reinforcement Learning algorithms have risen in popularity in recent years in the accelerator physics community, showing potential in beam control and in the optimization and automation of tasks in accelerator operation. The Helmholtz AI project "Machine Learning toward Autonomous Accelerators" is a collaboration between DESY and KIT that works on investigating and developing RL applications for the automatic start-up of electron linear accelerators. The work is carried out in parallel at two similar research accelerators: ARES at DESY and FLUTE at KIT, giving the unique opportunity of transfer learning between facilities. One of the first steps of this project is the establishment of a common interface between the simulations and the machine, in order to test and apply various optimization approaches interchangeably between the two accelerators. In this paper we present the first results on the common interface and its application to beam focusing in ARES, and the idea of laser shaping with spatial light modulators at FLUTE
Machine Learning Based Spatial Light Modulator Control for the Photoinjector Laser at FLUTE
FLUTE (Ferninfrarot Linac- und Test-Experiment) at KIT is a compact linac-based test facility for novel accelerator technology and a source of intense THz radiation. FLUTE is designed to provide a wide range of electron bunch charges from the pC- to nC-range, high electric fields up to 1.2 GV/m, and ultra-short THz pulses down to the fs-timescale. The electrons are generated at the RF photoinjector, where the electron gun is driven by a commercial titanium sapphire laser. In this kind of setup the electron beam properties are determined by the photoinjector, but more importantly by the characteristics of the laser pulses. Spatial light modulators can be used to transversely and longitudinally shape the laser pulse, offering a flexible way to shape the laser beam and subsequently the electron beam, influencing the produced THz pulses. However, nonlinear effects inherent to the laser manipulation (transportation, compression, third harmonic generation) can distort the original pulse. In this paper we propose to use machine learning methods to manipulate the laser and electron bunch, aiming to generate tailor-made THz pulses. The method is demonstrated experimentally in a test setup
Learning to Do or Learning While Doing: Reinforcement Learning and Bayesian Optimisation for Online Continuous Tuning
Online tuning of real-world plants is a complex optimisation problem that
continues to require manual intervention by experienced human operators.
Autonomous tuning is a rapidly expanding field of research, where
learning-based methods, such as Reinforcement Learning-trained Optimisation
(RLO) and Bayesian optimisation (BO), hold great promise for achieving
outstanding plant performance and reducing tuning times. Which algorithm to
choose in different scenarios, however, remains an open question. Here we
present a comparative study using a routine task in a real particle accelerator
as an example, showing that RLO generally outperforms BO, but is not always the
best choice. Based on the study's results, we provide a clear set of criteria
to guide the choice of algorithm for a given tuning task. These can ease the
adoption of learning-based autonomous tuning solutions to the operation of
complex real-world plants, ultimately improving the availability and pushing
the limits of operability of these facilities, thereby enabling scientific and
engineering advancements.Comment: 17 pages, 8 figures, 2 table
Synaptic transistor with multiple biological functions based on metal-organic frameworks combined with the LIF model of a spiking neural network to recognize temporal information
Spiking neural networks (SNNs) have immense potential due to their utilization of synaptic plasticity and ability to take advantage of temporal correlation and low power consumption. The leaky integration and firing (LIF) model and spike-timing-dependent plasticity (STDP) are the fundamental components of SNNs. Here, a neural device is first demonstrated by zeolitic imidazolate frameworks (ZIFs) as an essential part of the synaptic transistor to simulate SNNs. Significantly, three kinds of typical functions between neurons, the memory function achieved through the hippocampus, synaptic weight regulation and membrane potential triggered by ion migration, are effectively described through short-term memory/long-term memory (STM/LTM), long-term depression/long-term potentiation (LTD/LTP) and LIF, respectively. Furthermore, the update rule of iteration weight in the backpropagation based on the time interval between presynaptic and postsynaptic pulses is extracted and fitted from the STDP. In addition, the postsynaptic currents of the channel directly connect to the very large scale integration (VLSI) implementation of the LIF mode that can convert high-frequency information into spare pulses based on the threshold of membrane potential. The leaky integrator block, firing/detector block and frequency adaptation block instantaneously release the accumulated voltage to form pulses. Finally, we recode the steady-state visual evoked potentials (SSVEPs) belonging to the electroencephalogram (EEG) with filter characteristics of LIF. SNNs deeply fused by synaptic transistors are designed to recognize the 40 different frequencies of EEG and improve accuracy to 95.1%. This work represents an advanced contribution to brain-like chips and promotes the systematization and diversification of artificial intelligence