544 research outputs found
Electrical modulation of the edge channel transport in topological insulators coupled to ferromagnetic leads
The counterpropagating edge states of a two-dimensional topological insulator
(TI) carry electrons of opposite spins. We investigate the transport properties
of edge states in a two-dimensional TI which is contacted to ferromagnetic
leads. The application of a side-gate voltage induces a constriction or quantum
point contact (QPC) which couples the two edge channels. The transport
properties of the system is calculated via the Keldysh nonequilibrium Green's
function method. We found that inter-edge spin-flip coupling can significantly
enhance (suppress) the charge current when the magnetization of the leads are
anti-parallel (parallel) to one another. On the other hand, spin-conserving
inter-edge coupling generally reduces the current by backscattering regardless
of the magnetization configuration. The charge current and the conductance as a
function of the bias voltage, also exhibit similar trends with respect to
spin-flip coupling strength, for both parallel and anti-parallel
configurations. Hence, gate voltage modulation of edge states via a QPC can
provide a means of modulating the spin or charge current flow in TI-based
spintronics devices.Comment: 6 pages, 3 figures, submitted to J. Appl. Phy
Adversarial Training Towards Robust Multimedia Recommender System
With the prevalence of multimedia content on the Web, developing recommender
solutions that can effectively leverage the rich signal in multimedia data is
in urgent need. Owing to the success of deep neural networks in representation
learning, recent advance on multimedia recommendation has largely focused on
exploring deep learning methods to improve the recommendation accuracy. To
date, however, there has been little effort to investigate the robustness of
multimedia representation and its impact on the performance of multimedia
recommendation.
In this paper, we shed light on the robustness of multimedia recommender
system. Using the state-of-the-art recommendation framework and deep image
features, we demonstrate that the overall system is not robust, such that a
small (but purposeful) perturbation on the input image will severely decrease
the recommendation accuracy. This implies the possible weakness of multimedia
recommender system in predicting user preference, and more importantly, the
potential of improvement by enhancing its robustness. To this end, we propose a
novel solution named Adversarial Multimedia Recommendation (AMR), which can
lead to a more robust multimedia recommender model by using adversarial
learning. The idea is to train the model to defend an adversary, which adds
perturbations to the target image with the purpose of decreasing the model's
accuracy. We conduct experiments on two representative multimedia
recommendation tasks, namely, image recommendation and visually-aware product
recommendation. Extensive results verify the positive effect of adversarial
learning and demonstrate the effectiveness of our AMR method. Source codes are
available in https://github.com/duxy-me/AMR.Comment: TKD
NExT-Chat: An LMM for Chat, Detection and Segmentation
The development of large language models (LLMs) has greatly advanced the
field of multimodal understanding, leading to the emergence of large multimodal
models (LMMs). In order to enhance the level of visual comprehension, recent
studies have equipped LMMs with region-level understanding capabilities by
representing object bounding box coordinates as a series of text sequences
(pix2seq). In this paper, we introduce a novel paradigm for object location
modeling called pix2emb method, where we ask the LMM to output the location
embeddings and then decode them with different decoders. This paradigm allows
us to use different location formats (such as bounding boxes and masks) in
multimodal conversations. Leveraging the proposed pix2emb method, we train an
LMM named NExT-Chat and demonstrate its capability of handling multiple tasks
like visual grounding, region captioning, and grounded reasoning. Comprehensive
experiments show the effectiveness of our NExT-Chat on various tasks, e.g.,
NExT-Chat (87.7) vs. Shikra (86.9) on POPE-Random, NExT-Chat (68.9) vs. LISA
(67.9) on referring expression segmentation task, and NExT-Chat (79.6) vs.
Kosmos-2 (62.3) on region caption task. The code and model are released at
https://github.com/NExT-ChatV/NExT-Chat.Comment: Technical Report (https://next-chatv.github.io/
Nematic topological superconducting phase in Nb-doped Bi2Se3
A nematic topological superconductor has an order parameter symmetry, which
spontaneously breaks the crystalline symmetry in its superconducting state.
This state can be observed, for example, by thermodynamic or upper critical
field experiments in which a magnetic field is rotated with respect to the
crystalline axes. The corresponding physical quantity then directly reflects
the symmetry of the order parameter. We present a study on the superconducting
upper critical field of the Nb-doped topological insulator NbxBi2Se3 for
various magnetic field orientations parallel and perpendicular to the basal
plane of the Bi2Se3 layers. The data were obtained by two complementary
experimental techniques, magnetoresistance and DC magnetization, on three
different single crystalline samples of the same batch. Both methods and all
samples show with perfect agreement that the in-plane upper critical fields
clearly demonstrate a two-fold symmetry that breaks the three-fold crystal
symmetry. The two-fold symmetry is also found in the absolute value of the
magnetization of the initial zero-field-cooled branch of the hysteresis loop
and in the value of the thermodynamic contribution above the irreversibility
field, but also in the irreversible properties such as the value of the
characteristic irreversibility field and in the width of the hysteresis loop.
This provides strong experimental evidence that Nb-doped Bi2Se3 is a nematic
topological superconductor similar to the Cu- and Sr-doped Bi2Se3
On the Multi-turn Instruction Following for Conversational Web Agents
Web agents powered by Large Language Models (LLMs) have demonstrated
remarkable abilities in planning and executing multi-step interactions within
complex web-based environments, fulfilling a wide range of web navigation
tasks. Despite these advancements, the potential for LLM-powered agents to
effectively engage with sequential user instructions in real-world scenarios
has not been fully explored. In this work, we introduce a new task of
Conversational Web Navigation, which necessitates sophisticated interactions
that span multiple turns with both the users and the environment, supported by
a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To
tackle the limited context length of LLMs and the context-dependency issue of
the conversational tasks, we further propose a novel framework, named
self-reflective memory-augmented planning (Self-MAP), which employs memory
utilization and self-reflection techniques. Extensive experiments are conducted
to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the
proposed method
Transfer Visual Prompt Generator across LLMs
While developing a new vision-language LLM (VL-LLM) by pre-training on
tremendous image-text pairs from scratch can be exceedingly resource-consuming,
connecting an existing LLM with a comparatively lightweight visual prompt
generator (VPG) becomes a feasible paradigm. However, further tuning the VPG
part of the VL-LLM still suffers from indispensable computational costs, i.e.,
requiring thousands of GPU hours and millions of training data. One alternative
solution is to transfer an existing VPG from any existing VL-LLMs for the
target VL-LLM.
In this work, we for the first time investigate the VPG transferability
across LLMs, and explore a solution to reduce the cost of VPG transfer. We
first study the VPG transfer across different LLM sizes (e.g., small-to-large),
and across different LLM types, through which we diagnose the key factors to
maximize the transfer efficiency. Based on our observation, we design a
two-stage transfer framework named VPGTrans, which is simple yet highly
effective. Through extensive experiments, we demonstrate that VPGTrans helps
significantly speed up the transfer learning process without compromising
performance. Remarkably, it helps achieve the VPG transfer from BLIP-2
OPT to BLIP-2 OPT with over 10 times speed-up and
10.7% training data compared with connecting a VPG to OPT from
scratch. Further, a series of intriguing findings and potential rationales
behind them are provided and discussed. Finally, we showcase the practical
value of our VPGTrans approach, by customizing two novel VL-LLMs, including
VL-LLaMA and VL-Vicuna, with recently released LLaMA and Vicuna LLMs.Comment: Project Website: https://vpgtrans.github.io Code:
https://github.com/VPGTrans/VPGTran
Fine-Grained Scene Graph Generation with Data Transfer
Scene graph generation (SGG) is designed to extract (subject, predicate,
object) triplets in images. Recent works have made a steady progress on SGG,
and provide useful tools for high-level vision and language understanding.
However, due to the data distribution problems including long-tail distribution
and semantic ambiguity, the predictions of current SGG models tend to collapse
to several frequent but uninformative predicates (e.g., on, at), which limits
practical application of these models in downstream tasks. To deal with the
problems above, we propose a novel Internal and External Data Transfer
(IETrans) method, which can be applied in a plug-and-play fashion and expanded
to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the
data distribution problem by automatically creating an enhanced dataset that
provides more sufficient and coherent annotations for all predicates. By
training on the enhanced dataset, a Neural Motif model doubles the macro
performance while maintaining competitive micro performance. The code and data
are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.Comment: ECCV 2022 (Oral
- …