25 research outputs found

    Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning

    Full text link
    Large language models (LLMs) have recently shown great potential for in-context learning, where LLMs learn a new task simply by conditioning on a few input-label pairs (prompts). Despite their potential, our understanding of the factors influencing end-task performance and the robustness of in-context learning remains limited. This paper aims to bridge this knowledge gap by investigating the reliance of LLMs on shortcuts or spurious correlations within prompts. Through comprehensive experiments on classification and extraction tasks, we reveal that LLMs are "lazy learners" that tend to exploit shortcuts in prompts for downstream tasks. Additionally, we uncover a surprising finding that larger models are more likely to utilize shortcuts in prompts during inference. Our findings provide a new perspective on evaluating robustness in in-context learning and pose new challenges for detecting and mitigating the use of shortcuts in prompts

    From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework

    Full text link
    Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The long-lasting adversarial attack-and-defense arms race in Natural Language Processing (NLP) is algorithm-centric, providing valuable techniques for automatic robustness evaluation. However, the existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples. In this paper, we aim to set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to further exploit the advantages of adversarial attacks. To address the above challenges, we first determine robustness evaluation dimensions based on model capabilities and specify the reasonable algorithm to generate adversarial samples for each dimension. Then we establish the evaluation protocol, including evaluation settings and metrics, under realistic demands. Finally, we use the perturbation degree of adversarial samples to control the sample validity. We implement a toolkit RobTest that realizes our automatic robustness evaluation framework. In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework. The code will be made public at \url{https://github.com/thunlp/RobTest}.Comment: Accepted to Findings of ACL 202

    Trust-based Service Recommendation in Social Network

    No full text
    With the number of Web services increasing constantly on the Internet, how to recommend personalized Web services for users has become more and more important. At present, there emerged some service recommendation systems utilizing influence ranking and collaborative filtering algorithms in service recommendation. However, they neither considered trust relationships among users, nor deal with the cold start problem very well. Fortunately, the popularity of social network in nowadays brings a good alternative for service recommendation to avoid those. In this study, we propose a social network-based service-recommendation method, which considers users’ history service invocation behaviors, users preferences as well as trust relationships among users implied in social network and users comments/reviews on services. We have applied this method in a data set extracted from www.epinions.com. A series of experiments on 86,719 users, 604,190 user trust-relationships and 963,591 reviews on 292,713 services/produces show that this recommendation method get better recall rate, precision, f-measure and rank score

    SAM-Net: Integrating Event-Level and Chain-Level Attentions to Predict What Happens Next

    No full text
    Scripts represent knowledge of event sequences that can help text understanding. Script event prediction requires to measure the relation between an existing chain and the subsequent event. The dominant approaches either focus on the effects of individual events, or the influence of the chain sequence. However, only considering individual events will lose much semantic relations within the event chain, and only considering the sequence of the chain will introduce much noise. With our observations, both the individual events and the event segments within the chain can facilitate the prediction of the subsequent event. This paper develops self attention mechanism to focus on diverse event segments within the chain and the event chain is represented as a set of event segments. We utilize the event-level attention to model the relations between subsequent events and individual events. Then, we propose the chain-level attention to model the relations between subsequent events and event segments within the chain. Finally, we integrate event-level and chain-level attentions to interact with the chain to predict what happens next. Comprehensive experiment results on the widely used New York Times corpus demonstrate that our model achieves better results than other state-of-the-art baselines by adopting the evaluation of Multi-Choice Narrative Cloze task

    Natural Sciences Publishing Cor. A Trust Evaluation Mechanism for Collaboration of Data-Intensive Services in Cloud

    No full text
    Abstract: Trust and reputation for services emerges as an important issue in cloud computing. Since data-intensive services in cloud have been used in more and more fields, trust evaluation for collaboration of services meets more challenges. There are not only logical dependencies but also data dependencies among partner services when data-intensive services take part in collaboration. This paper proposes a novel trust evaluation method for collaborations of data-intensive services. It considers not only the trust for individual partner services and the explicit trust relation among partner services that have logical dependencies for each other, but also the implicit trust relation implied in data-dependencies among services. A serial of experiments, using the simulation tool NetLogo, are carried out to compare the evaluation results between the proposed method and the method without data-dependency consideration. The result shows that taking consideration of the data-dependency trust improves the accuracy of trust evaluation to a great extent

    Vertically Aligned Sn(4+)Preintercalated Ti(2)CT(X)MXene Sphere with Enhanced Zn Ion Transportation and Superior Cycle Lifespan

    No full text
    While 2D MXenes have been widely used in energy storage systems, surface barriers induced by restacking of nanosheets and the limited kinetics resulting from insufficient interlayer spacing are two unresolved issues. Here an Sn(4+)preintercalated Ti(2)CT(X)with effectively enlarged interlayer spacing is synthesized. The preintercalated Ti(2)CT(X)is aligned on a carbon sphere to further enhance ion transportation by shortening the ion diffusion path and enhancing the reaction kinetics. As a result, when paired with a Zn anode, 12 500 cycles, which equals 2 800 h cycle time, and 5% capacity fluctuation are obtained, surpassing all reported MXene-based aqueous electrodes. At 0.1 A g(-1), the capacity reaches 138 mAh g(-1), and 92 mAh g(-1)remains even at 5 A g(-1). In addition, the low anti-self-discharge rate of 0.989 mV h(-1)associated with a high capacity retention of 80.5% over 548 h is obtained. Moreover, the fabricated quasi-solid capacitor based on a hydrogel film electrolyte exhibits good mechanical deformation and weather resistance. This work employs both preintercalation and alignment to MXene and achieves enhanced ion diffusion kinetics in an aqueous zinc ion capacitors (ZICs) system, which may be applied to other MXene batteries for enhanced performance

    In Situ Electrochemical Synthesis of MXenes without Acid/Alkali Usage in/for an Aqueous Zinc Ion Battery

    No full text
    The traditional method to fabricate a MXene based energy storage device starts from etching MAX phase particles with dangerous acid/alkali etchants to MXenes, followed by device assembly. This is a multistep protocol and is not environmentally friendly. Herein, an all-in-one protocol is proposed to integrate synthesis and battery fabrication of MXene. By choosing a special F-rich electrolyte, MAX V2AlC is directly exfoliated inside a battery and the obtained V(2)CT(X)MXene is in situ used to achieve an excellent battery performance. This is a one-step process with all reactions inside the cell, avoiding any contamination to external environments. Through the lifetime, the device experiences three stages of exfoliation, electrode oxidation, and redox of V2O5. While the electrode is changing, the device can always be used as a battery and the performance is continuously enhanced. The resulting aqueous zinc ion battery achieves outstanding cycling stability (4000 cycles) and rate performance (97.5 mAh g(-1)at 64 A g(-1)), distinct from all reported aqueous MXene-based counterparts with pseudo-capacitive properties, and outperforming most vanadium-based zinc ion batteries with high capacity. This work sheds light on the green synthesis of MXenes, provides an all-in-one protocol for MXene devices, and extends MXenes' application in the aqueous energy storage field

    Research of the Distribution of Tongue Features of Diabetic Population Based on Unsupervised Learning Technology

    No full text
    Background. The prevalence of diabetes increases year by year, posing a severe threat to human health. Current treatments are difficult to prevent the progression of diabetes and its complications. It is imperative to carry out individualized treatment of diabetes, but current diagnostic methods are difficult to specify an individualized treatment plan. Objective. Clarify the distribution law of tongue features of the diabetic population, and provide the diagnostic basis for individualized treatment of traditional Chinese medicine (TCM) in the treatment of diabetes. Methods. We use the TFDA-1 tongue diagnosis instrument to collect tongue images of people with diabetes and accurately calculate the color features, texture features, and tongue coating ratio features through the Tongue Diagnosis Analysis System (TDAS). Then, we used K-means and Self-organizing Maps (SOM) networks to analyze the distribution of tongue features in diabetic people. Statistical analysis of TDAS features was used to identify differences between clusters. Results. The silhouette coefficient of the K-means clustering result is 0.194, and the silhouette coefficient of the SOM clustering result is 0.127. SOM Cluster 3 and Cluster 4 are derived from K-means Cluster 1, and the intersections account for (76.7% 97.5%) and (22.3% and 70.4%), respectively. K-means Cluster 2 and SOM Cluster 1 are highly overlapping, and the intersection accounts for the ratios of 66.9% and 95.0%. K-means Cluster 3 and SOM Cluster 2 are highly overlaid, and the intersection ratio is 94.1% and 82.1%. For the clustering results of K-means, TB-a and TC-a of Cluster 3 are the highest (P<0.001), TB-a of Cluster 2 is the lowest (P<0.001), and TB-a of Cluster 1 is between Cluster 2 and Cluster 3 (P<0.001). Cluster 1 has the highest TB-b and TC-b (P<0.001), Cluster 2 has the lowest TB-b and TC-b (P<0.001), and TB-b and TC-b of Cluster 3 are between Cluster 1 and Cluster 2 (P<0.001). Cluster 1 has the highest TB-ASM and TC-ASM (P<0.001), Cluster 3 has the lowest TB-ASM and TC-ASM (P<0.001), and TB-ASM and TC-ASM of Cluster 2 are between the Cluster 1 and Cluster 3 (P<0.001). CON, ENT, and MEAN show the opposite trend. Cluster 2 had the highest Per-all (P<0.001). SOM divides K-means Cluster 1 into two categories. There is almost no difference in texture features between Cluster 3 and Cluster 4 in the SOM clustering results. Cluster 3’s TB-L, TC-L, and Per-all are lower than Cluster 4 (P<0.001), Cluster 3’s TB-a, TC-a, TB-b, TC-b, and Per-part are higher than Cluster 4 (P<0.001). Conclusions. The precise tongue image features calculated by TDAS are the basis for characterizing the disease state of diabetic people. Unsupervised learning technology combined with statistical analysis is an important means to discover subtle changes in the tongue features of diabetic people. The machine vision analysis method based on unsupervised machine learning technology realizes the classification of the diabetic population based on fine tongue features. It provides a diagnostic basis for the designated diabetes TCM treatment plan
    corecore