95 research outputs found

    SwitchGPT: Adapting Large Language Models for Non-Text Outputs

    Full text link
    Large Language Models (LLMs), primarily trained on text-based datasets, exhibit exceptional proficiencies in understanding and executing complex linguistic instructions via text outputs. However, they falter when requests to generate non-text ones. Concurrently, modality conversion models, such as text-to-image, despite generating high-quality images, suffer from a lack of extensive textual pretraining. As a result, these models are only capable of accommodating specific image descriptions rather than comprehending more complex instructions. To bridge this gap, we propose a novel approach, \methodname, from a modality conversion perspective that evolves a text-based LLM into a multi-modal one. We specifically employ a minimal dataset to instruct LLMs to recognize the intended output modality as directed by the instructions. Consequently, the adapted LLM can effectively summon various off-the-shelf modality conversion models from the model zoos to generate non-text responses. This circumvents the necessity for complicated pretraining that typically requires immense quantities of paired multi-modal data, while simultaneously inheriting the extensive knowledge of LLMs and the ability of high-quality generative models. To evaluate and compare the adapted multi-modal LLM with its traditional counterparts, we have constructed a multi-modal instruction benchmark that solicits diverse modality outputs. The experiment results reveal that, with minimal training, LLMs can be conveniently adapted to comprehend requests for non-text responses, thus achieving higher flexibility in multi-modal scenarios. Code and data will be made available at https://github.com/xinke-wang/SwitchGPT

    Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example

    Full text link
    Machine learning (ML) is widely used to explore crystal materials and predict their properties. However, the training is time-consuming for deep-learning models, and the regression process is a black box that is hard to interpret. Also, the preprocess to transfer a crystal structure into the input of ML, called descriptor, needs to be designed carefully. To efficiently predict important properties of materials, we propose an approach based on ensemble learning consisting of regression trees to predict formation energy and elastic constants based on small-size datasets of carbon allotropes as an example. Without using any descriptor, the inputs are the properties calculated by molecular dynamics with 9 different classical interatomic potentials. Overall, the results from ensemble learning are more accurate than those from classical interatomic potentials, and ensemble learning can capture the relatively accurate properties from the 9 classical potentials as criteria for predicting the final properties

    Contrastive Vision-Language Alignment Makes Efficient Instruction Learner

    Full text link
    We study the task of extending the large language model (LLM) into a vision-language instruction-following model. This task is crucial but challenging since the LLM is trained on text modality only, making it hard to effectively digest the visual modality. To address this, existing methods typically train a visual adapter to align the representation between a pre-trained vision transformer (ViT) and the LLM by a generative image captioning loss. However, we find that the generative objective can only produce weak alignment for vision and language, making the aligned vision-language model very hungry for the instruction fine-tuning data. In this paper, we propose CG-VLM that applies both Contrastive and Generative alignment objectives to effectively align the representation of ViT and LLM. Different from image level and sentence level alignment in common contrastive learning settings, CG-VLM aligns the image-patch level features and text-token level embeddings, which, however, is very hard to achieve as no explicit grounding patch-token relation provided in standard image captioning datasets. To address this issue, we propose to maximize the averaged similarity between pooled image-patch features and text-token embeddings. Extensive experiments demonstrate that the proposed CG-VLM produces strong vision-language alignment and is an efficient instruction learner. For example, using only 10% instruction tuning data, we reach 95% performance of state-of-the-art method LLaVA [29] on the zero-shot ScienceQA-Image benchmark.Comment: 17 pages, 10 pages for main paper, 7 pages for supplementar

    Frequency Enhanced Hybrid Attention Network for Sequential Recommendation

    Full text link
    The self-attention mechanism, which equips with a strong capability of modeling long-range dependencies, is one of the extensively used techniques in the sequential recommendation field. However, many recent studies represent that current self-attention based models are low-pass filters and are inadequate to capture high-frequency information. Furthermore, since the items in the user behaviors are intertwined with each other, these models are incomplete to distinguish the inherent periodicity obscured in the time domain. In this work, we shift the perspective to the frequency domain, and propose a novel Frequency Enhanced Hybrid Attention Network for Sequential Recommendation, namely FEARec. In this model, we firstly improve the original time domain self-attention in the frequency domain with a ramp structure to make both low-frequency and high-frequency information could be explicitly learned in our approach. Moreover, we additionally design a similar attention mechanism via auto-correlation in the frequency domain to capture the periodic characteristics and fuse the time and frequency level attention in a union model. Finally, both contrastive learning and frequency regularization are utilized to ensure that multiple views are aligned in both the time domain and frequency domain. Extensive experiments conducted on four widely used benchmark datasets demonstrate that the proposed model performs significantly better than the state-of-the-art approaches.Comment: 11 pages, 7 figures, The 46th International ACM SIGIR Conference on Research and Development in Information Retrieva

    Unveiling the medium-range order in glass models and its role in glass formation

    Get PDF
    The correlation between structure and glass formability in glassy systems is a long-standing puzzle. To solve this puzzle, many descriptors based on the short-range order (SRO) have been proposed. Here we show that the SRO, however, offers little help in explaining the glass formability and stability; instead it is the formation of medium-range order that stabilizes the glass against crystallization by suppressing the atomic rearrangement and compositional change. Our results provide a perspective for understanding the correlation between structure and stability in glasse

    RITA: Boost Autonomous Driving Simulators with Realistic Interactive Traffic Flow

    Full text link
    High-quality traffic flow generation is the core module in building simulators for autonomous driving. However, the majority of available simulators are incapable of replicating traffic patterns that accurately reflect the various features of real-world data while also simulating human-like reactive responses to the tested autopilot driving strategies. Taking one step forward to addressing such a problem, we propose Realistic Interactive TrAffic flow (RITA) as an integrated component of existing driving simulators to provide high-quality traffic flow for the evaluation and optimization of the tested driving strategies. RITA is developed with consideration of three key features, i.e., fidelity, diversity, and controllability, and consists of two core modules called RITABackend and RITAKit. RITABackend is built to support vehicle-wise control and provide traffic generation models from real-world datasets, while RITAKit is developed with easy-to-use interfaces for controllable traffic generation via RITABackend. We demonstrate RITA's capacity to create diversified and high-fidelity traffic simulations in several highly interactive highway scenarios. The experimental findings demonstrate that our produced RITA traffic flows exhibit all three key features, hence enhancing the completeness of driving strategy evaluation. Moreover, we showcase the possibility for further improvement of baseline strategies through online fine-tuning with RITA traffic flows.Comment: 8 pages, 5 figures, 3 table

    Expression of Claudin-9 (CLDN9) in breast cancer, the clinical significance in connection with its subcoat anchorage proteins ZO-1 and ZO-3 and impact on drug resistance

    Get PDF
    (1) Introduction: Claudin-9 (CLDN9) is a member of the claudin protein family, a critical transmembrane protein family for tight junctions that are implemented in the progression of numerous cancer types. The present study investigated the role that CLDN9, along with the subcoat proteins, Zonula Occludens (ZOs), plays in clinical breast cancer and subsequent impact on drug response of patients. (2) Methods: CLDN9 protein and CLDN9 transcript were determined and correlated with clinical and pathological indicators, together with the status of hormonal receptors. The levels of CLDN9 transcript were also assessed against the therapeutic responses of the patients to chemotherapies by using a dataset from the TCGA database. Breast cancer cell models, representing different molecular subtypes of breast cancer, with differential expression of CLDN9 were created and used to assess the biological impact and response to chemotherapeutic drugs. (3) Results: Breast cancer tissues expressed significantly higher levels of the CLDN9, with the high levels being associated with shorter survival. CLDN9 was significantly correlated with its anchorage proteins ZO-1 and ZO-3. Integrated expression of CLDN9, ZO-1 and ZO-3 formed a signature that was significantly linked to overall survival (OS) (p = 0.013) and relapse-free survival (RFS) (p = 0.024) in an independent matter. CLDN9 transcript was significantly higher in patients who were resistant to chemotherapies (p < 0.000001). CLDN9 connection to chemoresistance was particularly prominent in patients of ER-positive (ER(+)), Her-2-negative((Her-2(−)), ER(+)/Her-2(−) and triple-negative breast cancers (TNBCs), but not in patients with HER-2-positive tumors. In Her-2-negative MCF7 and MDA-MB-231 cancer cells, loss of CLDN9 significantly increased sensitivity to several chemotherapeutic drugs including paclitaxel, gemcitabine and methotrexate, which was not seen in Her-2(+) SKBR3 cells. However, suppressing Her-2 using neratinib, a permanent Her-2 inhibitor, sensitized cellular response to these chemodrugs in cells with CLDN9 knockdown. (4) Conclusions: CLDN9 is an important prognostic indicator for patients with breast cancer and also a pivotal factor in assessing patient responses to chemotherapies. Her-2 is a negating factor for the treatment response prediction value by CLDN9 and negating Her-2 and CLDN9 may enhance breast cancer cellular response to chemotherapeutic drugs

    Light control of surface–bulk coupling by terahertz vibrational coherence in a topological insulator

    Get PDF
    The demand for disorder-tolerant quantum logic and spin electronics can be met by generating and controlling dissipationless spin currents protected by topology. Dirac fermions with helical spin-locking surface transport offer a way of achieving such a goal. Yet, surface-bulk coupling can lead to strong Dirac electron scattering with bulk carriers and phonons as well as impurities, assisted by such dissipative channel, which results in “topological breakdown”. Here, we demonstrate that coherent lattice vibrations periodically driven by a single-cycle terahertz (THz) pulse can significantly suppress such dissipative channel in topological insulators. This is achieved by reducing the phase space in the bulk available for Dirac fermion scattering into during coherent lattice oscillations in Bi2Se3. This light-induced suppression manifests as a remarkable transition exclusively in surface transport, absent for bulk, above the THz electric fields for driving coherent phonons, which prolongs the surface transport lifetime. These results, together with simulations, identify the critical role of spin–orbit coupling for the “phase space contraction” mechanism that suppresses the surface-bulk coupling. Imposing vibrational quantum coherence into topological states of matter may become a universal light control principle for reinforcing the symmetry-protected helical transport

    Fibroblast Growth Factor 21 Deficiency Attenuates Experimental Colitis-Induced Adipose Tissue Lipolysis

    Get PDF
    Aims. Nutrient deficiencies are common in patients with inflammatory bowel disease (IBD). Adipose tissue plays a critical role in regulating energy balance. Fibroblast growth factor 21 (FGF21) is an important endocrine metabolic regulator with emerging beneficial roles in lipid homeostasis. We investigated the impact of FGF21 in experimental colitis-induced epididymal white adipose tissue (eWAT) lipolysis. Methods. Mice were given 2.5% dextran sulfate sodium (DSS) ad libitum for 7 days to induce colitis. The role of FGF21 was investigated using antibody neutralization or knockout (KO) mice. Lipolysis index and adipose lipolytic enzymes were determined. In addition, 3T3-L1 cells were pretreated with IL-6, followed by recombinant human FGF21 (rhFGF21) treatment; lipolysis was assessed. Results. DSS markedly decreased eWAT/body weight ratio and increased serum concentrations of free fatty acid (FFA) and glycerol, indicating increased adipose tissue lipolysis. eWAT intracellular lipolytic enzyme expression/activation was significantly increased. These alterations were significantly attenuated in FGF21 KO mice and by circulating FGF21 neutralization. Moreover, DSS treatment markedly increased serum IL-6 and FGF21 levels. IL-6 pretreatment was necessary for the stimulatory effect of FGF21 on adipose lipolysis in 3T3-L1 cells. Conclusions. Our results demonstrate that experimental colitis induces eWAT lipolysis via an IL-6/FGF21-mediated signaling pathway

    Metastatic Lymph Node 64 (MLN64) expression in gastric cancer, the clinical and molecular implications in drug resistance

    Get PDF
    Background/Aim: Metastatic Lymph Node 64 (MLN64) is often co-amplified with ERBB2 (HER2) and plays a role in the progression of breast and prostate cancers. The present study explored the expression of MLN64 in clinical gastric cancer in association with the ERBB family and its impact on drug resistance in patients. Materials and Methods: Two independent gastric cancer cohorts (n=324; n=87) were used to explore the expression profile of MLN64 in con-junction with ERBB family members in clinical gastric cancer and its association with neoad-juvant chemotherapy responses. Gastric cancer AGS and HCG27 cells with MLN64 knock-down were generated to determine the function of MLN64 in cell behavioral changes. Results: Gastric tumor tissues expressed significantly increased levels of MLN64 compared with normal tissues (p<0.01); however, MLN64 alone was a weak prognostic indicator. An integrated co-expression of MLN64, ERBB4, and NRG4 was a significant factor in assessing overall survival in both cohorts. MLN64 was a profound indicator of patient response to neoadjuvant chemotherapy. In vitro studies indicated a significant contribution of MLN64 to the response of gastric cancer cells to chemodrugs and Her-2 inhibitors. MLN64 knockdown also contributed to the adhesiveness and migration and suggested a possible mechanism mediated by the in-teraction between MLN64 and ERBBs. Conclusion: MLN64 is an indicator for patient response to neoadjuvant chemotherapies in gastric cancer. Together with the expression pattern of ERBB4, it makes is a poor prognostic factor in gastric cancer patients
    • 

    corecore