544 research outputs found

    Electrical modulation of the edge channel transport in topological insulators coupled to ferromagnetic leads

    Full text link
    The counterpropagating edge states of a two-dimensional topological insulator (TI) carry electrons of opposite spins. We investigate the transport properties of edge states in a two-dimensional TI which is contacted to ferromagnetic leads. The application of a side-gate voltage induces a constriction or quantum point contact (QPC) which couples the two edge channels. The transport properties of the system is calculated via the Keldysh nonequilibrium Green's function method. We found that inter-edge spin-flip coupling can significantly enhance (suppress) the charge current when the magnetization of the leads are anti-parallel (parallel) to one another. On the other hand, spin-conserving inter-edge coupling generally reduces the current by backscattering regardless of the magnetization configuration. The charge current and the conductance as a function of the bias voltage, also exhibit similar trends with respect to spin-flip coupling strength, for both parallel and anti-parallel configurations. Hence, gate voltage modulation of edge states via a QPC can provide a means of modulating the spin or charge current flow in TI-based spintronics devices.Comment: 6 pages, 3 figures, submitted to J. Appl. Phy

    Adversarial Training Towards Robust Multimedia Recommender System

    Full text link
    With the prevalence of multimedia content on the Web, developing recommender solutions that can effectively leverage the rich signal in multimedia data is in urgent need. Owing to the success of deep neural networks in representation learning, recent advance on multimedia recommendation has largely focused on exploring deep learning methods to improve the recommendation accuracy. To date, however, there has been little effort to investigate the robustness of multimedia representation and its impact on the performance of multimedia recommendation. In this paper, we shed light on the robustness of multimedia recommender system. Using the state-of-the-art recommendation framework and deep image features, we demonstrate that the overall system is not robust, such that a small (but purposeful) perturbation on the input image will severely decrease the recommendation accuracy. This implies the possible weakness of multimedia recommender system in predicting user preference, and more importantly, the potential of improvement by enhancing its robustness. To this end, we propose a novel solution named Adversarial Multimedia Recommendation (AMR), which can lead to a more robust multimedia recommender model by using adversarial learning. The idea is to train the model to defend an adversary, which adds perturbations to the target image with the purpose of decreasing the model's accuracy. We conduct experiments on two representative multimedia recommendation tasks, namely, image recommendation and visually-aware product recommendation. Extensive results verify the positive effect of adversarial learning and demonstrate the effectiveness of our AMR method. Source codes are available in https://github.com/duxy-me/AMR.Comment: TKD

    NExT-Chat: An LMM for Chat, Detection and Segmentation

    Full text link
    The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs). In order to enhance the level of visual comprehension, recent studies have equipped LMMs with region-level understanding capabilities by representing object bounding box coordinates as a series of text sequences (pix2seq). In this paper, we introduce a novel paradigm for object location modeling called pix2emb method, where we ask the LMM to output the location embeddings and then decode them with different decoders. This paradigm allows us to use different location formats (such as bounding boxes and masks) in multimodal conversations. Leveraging the proposed pix2emb method, we train an LMM named NExT-Chat and demonstrate its capability of handling multiple tasks like visual grounding, region captioning, and grounded reasoning. Comprehensive experiments show the effectiveness of our NExT-Chat on various tasks, e.g., NExT-Chat (87.7) vs. Shikra (86.9) on POPE-Random, NExT-Chat (68.9) vs. LISA (67.9) on referring expression segmentation task, and NExT-Chat (79.6) vs. Kosmos-2 (62.3) on region caption task. The code and model are released at https://github.com/NExT-ChatV/NExT-Chat.Comment: Technical Report (https://next-chatv.github.io/

    Nematic topological superconducting phase in Nb-doped Bi2Se3

    Get PDF
    A nematic topological superconductor has an order parameter symmetry, which spontaneously breaks the crystalline symmetry in its superconducting state. This state can be observed, for example, by thermodynamic or upper critical field experiments in which a magnetic field is rotated with respect to the crystalline axes. The corresponding physical quantity then directly reflects the symmetry of the order parameter. We present a study on the superconducting upper critical field of the Nb-doped topological insulator NbxBi2Se3 for various magnetic field orientations parallel and perpendicular to the basal plane of the Bi2Se3 layers. The data were obtained by two complementary experimental techniques, magnetoresistance and DC magnetization, on three different single crystalline samples of the same batch. Both methods and all samples show with perfect agreement that the in-plane upper critical fields clearly demonstrate a two-fold symmetry that breaks the three-fold crystal symmetry. The two-fold symmetry is also found in the absolute value of the magnetization of the initial zero-field-cooled branch of the hysteresis loop and in the value of the thermodynamic contribution above the irreversibility field, but also in the irreversible properties such as the value of the characteristic irreversibility field and in the width of the hysteresis loop. This provides strong experimental evidence that Nb-doped Bi2Se3 is a nematic topological superconductor similar to the Cu- and Sr-doped Bi2Se3

    On the Multi-turn Instruction Following for Conversational Web Agents

    Full text link
    Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method

    Transfer Visual Prompt Generator across LLMs

    Full text link
    While developing a new vision-language LLM (VL-LLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm. However, further tuning the VPG part of the VL-LLM still suffers from indispensable computational costs, i.e., requiring thousands of GPU hours and millions of training data. One alternative solution is to transfer an existing VPG from any existing VL-LLMs for the target VL-LLM. In this work, we for the first time investigate the VPG transferability across LLMs, and explore a solution to reduce the cost of VPG transfer. We first study the VPG transfer across different LLM sizes (e.g., small-to-large), and across different LLM types, through which we diagnose the key factors to maximize the transfer efficiency. Based on our observation, we design a two-stage transfer framework named VPGTrans, which is simple yet highly effective. Through extensive experiments, we demonstrate that VPGTrans helps significantly speed up the transfer learning process without compromising performance. Remarkably, it helps achieve the VPG transfer from BLIP-2 OPT2.7B_\text{2.7B} to BLIP-2 OPT6.7B_\text{6.7B} with over 10 times speed-up and 10.7% training data compared with connecting a VPG to OPT6.7B_\text{6.7B} from scratch. Further, a series of intriguing findings and potential rationales behind them are provided and discussed. Finally, we showcase the practical value of our VPGTrans approach, by customizing two novel VL-LLMs, including VL-LLaMA and VL-Vicuna, with recently released LLaMA and Vicuna LLMs.Comment: Project Website: https://vpgtrans.github.io Code: https://github.com/VPGTrans/VPGTran

    Fine-Grained Scene Graph Generation with Data Transfer

    Full text link
    Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including long-tail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the data distribution problem by automatically creating an enhanced dataset that provides more sufficient and coherent annotations for all predicates. By training on the enhanced dataset, a Neural Motif model doubles the macro performance while maintaining competitive micro performance. The code and data are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.Comment: ECCV 2022 (Oral
    corecore