27 research outputs found

    Refining Implicit Argument Annotation for UCCA

    Full text link
    Predicate-argument structure analysis is a central component in meaning representations of text. The fact that some arguments are not explicitly mentioned in a sentence gives rise to ambiguity in language understanding, and renders it difficult for machines to interpret text correctly. However, only few resources represent implicit roles for NLU, and existing studies in NLP only make coarse distinctions between categories of arguments omitted from linguistic form. This paper proposes a typology for fine-grained implicit argument annotation on top of Universal Conceptual Cognitive Annotation's foundational layer. The proposed implicit argument categorisation is driven by theories of implicit role interpretation and consists of six types: Deictic, Generic, Genre-based, Type-identifiable, Non-specific, and Iterated-set. We exemplify our design by revisiting part of the UCCA EWT corpus, providing a new dataset annotated with the refinement layer, and making a comparative analysis with other schemes.Comment: DMR 202

    Cultural Adaptation of Recipes

    Full text link
    Building upon the considerable advances in Large Language Models (LLMs), we are now equipped to address more sophisticated tasks demanding a nuanced understanding of cross-cultural contexts. A key example is recipe adaptation, which goes beyond simple translation to include a grasp of ingredients, culinary techniques, and dietary preferences specific to a given culture. We introduce a new task involving the translation and cultural adaptation of recipes between Chinese and English-speaking cuisines. To support this investigation, we present CulturalRecipes, a unique dataset comprised of automatically paired recipes written in Mandarin Chinese and English. This dataset is further enriched with a human-written and curated test set. In this intricate task of cross-cultural recipe adaptation, we evaluate the performance of various methods, including GPT-4 and other LLMs, traditional machine translation, and information retrieval techniques. Our comprehensive analysis includes both automatic and human evaluation metrics. While GPT-4 exhibits impressive abilities in adapting Chinese recipes into English, it still lags behind human expertise when translating English recipes into Chinese. This underscores the multifaceted nature of cultural adaptations. We anticipate that these insights will significantly contribute to future research on culturally-aware language models and their practical application in culturally diverse contexts.Comment: Accepted to TAC

    AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

    Full text link
    Evaluating the general abilities of foundation models to tackle human-level tasks is a vital aspect of their development and application in the pursuit of Artificial General Intelligence (AGI). Traditional benchmarks, which rely on artificial datasets, may not accurately represent human-level capabilities. In this paper, we introduce AGIEval, a novel benchmark specifically designed to assess foundation model in the context of human-centric standardized exams, such as college entrance exams, law school admission tests, math competitions, and lawyer qualification tests. We evaluate several state-of-the-art foundation models, including GPT-4, ChatGPT, and Text-Davinci-003, using this benchmark. Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and math competitions, attaining a 95% accuracy rate on the SAT Math test and a 92.5% accuracy on the English test of the Chinese national college entrance exam. This demonstrates the extraordinary performance of contemporary foundation models. In contrast, we also find that GPT-4 is less proficient in tasks that require complex reasoning or specific domain knowledge. Our comprehensive analyses of model capabilities (understanding, knowledge, reasoning, and calculation) reveal these models' strengths and limitations, providing valuable insights into future directions for enhancing their general capabilities. By concentrating on tasks pertinent to human cognition and decision-making, our benchmark delivers a more meaningful and robust evaluation of foundation models' performance in real-world scenarios. The data, code, and all model outputs are released in https://github.com/microsoft/AGIEval.Comment: 19 page

    Great Service! Fine-grained Parsing of Implicit Arguments

    No full text
    Broad-coverage meaning representations in NLP mostly focus on explicitly expressed content. More importantly, the scarcity of datasets annotating diverse implicit roles limits empirical studies into their linguistic nuances. For example, in the web review "Great service!", the provider and consumer are implicit arguments of different types. We examine an annotated corpus of fine-grained implicit arguments (Cui and Hershcovich, 2020) by carefully re-annotating it, resolving several inconsistencies. Subsequently, we present the first transition-based neural parser that can handle implicit arguments dynamically, and experiment with two different transition systems on the improved dataset. We find that certain types of implicit arguments are more difficult to parse than others and that the simpler system is more accurate in recovering implicit arguments, despite having a lower overall parsing score, attesting current reasoning limitations of NLP models. This work will facilitate a better understanding of implicit and underspecified language, by incorporating it holistically into meaning representations.Comment: IWPT 202

    HUJI-KU at MRP 2020:Two Transition-based Neural Parsers

    No full text

    How Conservative are Language Models? Adapting to the Introduction of Gender-Neutral Pronouns

    Full text link
    Gender-neutral pronouns have recently been introduced in many languages to a) include non-binary people and b) as a generic singular. Recent results from psycholinguistics suggest that gender-neutral pronouns (in Swedish) are not associated with human processing difficulties. This, we show, is in sharp contrast with automated processing. We show that gender-neutral pronouns in Danish, English, and Swedish are associated with higher perplexity, more dispersed attention patterns, and worse downstream performance. We argue that such conservativity in language models may limit widespread adoption of gender-neutral pronouns and must therefore be resolved.Comment: To appear at NAACL 202

    Analysis of Drought Vulnerability Characteristics and Risk Assessment Based on Information Distribution and Diffusion in Southwest China

    No full text
    Drought vulnerability characteristics and risk assessment form the basis of drought risk management. In this study, the standardized precipitation index (SPI) and drought damage rates (DDR) were combined to analyze drought vulnerability characteristics and drought risk in Southwest China (SC). The information distribution method was applied to estimate the probability density of the drought strength (DS) and the two-dimensional normal information diffusion method was used to construct the vulnerability relationships between DS and drought damage (DD). The risk was then evaluated by combining the probability function of the DS and the DD vulnerability curve. The results showed that the relationship between the DS and the DD was nonlinear in SC and its provinces. With the increase in DS, the degree of DD increased gradually, stabilized, or decreased toward the end. However, the vulnerability characteristics of the different provinces varied widely due to multiple risk-bearing bodies and abilities to resist disasters. The risk values obtained across the range of time scales of the SPI were not significantly different. The yielding probabilities will be reduced for the crop area by 10%, 30%, and 70% due to drought. Compared to a normal year in SC, the probability values were 16.04%, 10.29%, and 2.70%, respectively. These results have the potential to provide a reference for agricultural production and drought risk management

    Flexible ITO-free organic solar cells over 10% by employing drop-coated conductive PEDOT:PSS transparent anodes

    No full text
    Flexible ITO-free organic solar cells over 10% by employing drop-coated conductive PEDOT:PSS transparent anode
    corecore