4 research outputs found

    Personal named entity linking based on simple partial tree matching and context free grammar

    Get PDF
    Personal name disambiguation is the task of linking a personal name to a unique comparable entry in the real world, also known as named entity linking (NEL). Algorithms for NEL consist of three main components: extractor, searcher, and disambiguator. Existing approaches for NEL use exact-matched look-up over the surface form to generate a set of candidate entities in each of the mentioned names. The exact-matched look-up is wholly inadequate to generate a candidate entity due to the fact that the personal names within a web page lack uniform representation. In addition, the performance of a disambiguator in ranking candidate entities is limited by context similarity. Context similarity is an inflexible feature for personal disambiguation because natural language is highly variable. We propose a new approach that can be used to both identify and disambiguate personal names mentioned on a web page. Our NEL algorithm uses: as an extractor: a control flow graph; AlchemyAPI, as a searcher: Personal Name Transformation Modules (PNTM) based on Context Free Grammar and the Jaro-Winkler text similarity metric and as a disambiguator: the entity coherence method: the Occupation Architecture for Personal Name Disambiguation (OAPnDis), personal name concepts and Simple Partial Tree Matching (SPTM). Experimental results, evaluated on real-world data sets, show that the accuracy of our NEL is 92%, which is higher than the accuracy of previously used methods

    Entity Linking in Low-Annotation Data Settings

    Get PDF
    Recent advances in natural language processing have focused on applying and adapting large pretrained language models to specific tasks. These models, such as BERT (Devlin et al., 2019) and BART (Lewis et al., 2020a), are pretrained on massive amounts of unlabeled text across a variety of domains. The impact of these pretrained models is visible in the task of entity linking, where a mention of an entity in unstructured text is matched to the relevant entry in a knowledge base. State-of-the-art linkers, such as Wu et al. (2020) and De Cao et al. (2021), leverage pretrained models as a foundation for their systems. However, these models are also trained on large amounts of annotated data, which is crucial to their performance. Often these large datasets consist of domains that are easily annotated, such as Wikipedia or newswire text. However, tailoring NLP tools to a narrow variety of textual domains severely restricts their use in the real world. Many other domains, such as medicine or law, do not have large amounts of entity linking annotations available. Entity linking, which serves to bridge the gap between massive unstructured amounts of text and structured repositories of knowledge, is equally crucial in these domains. Yet tools trained on newswire or Wikipedia annotations are unlikely to be well-suited for identifying medical conditions mentioned in clinical notes. As most annotation efforts focus on English, similar challenges can be noted in building systems for non-English text. There is often a relatively small amount of annotated data in these domains. With this being the case, looking to other types of domain-specific data, such as unannotated text or highly-curated structured knowledge bases, is often required. In these settings, it is crucial to translate lessons taken from tools tailored for high-annotation domains into algorithms that are suited for low-annotation domains. This requires both leveraging broader types of data and understanding the unique challenges present in each domain

    Computational approaches to semantic change (Volume 6)

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans

    Language as purposeful: functional varieties of text. 2nd Edition

    Get PDF
    This second edition of Language as Purposeful: Functional Varieties of Text, first published in 2004, is an across-the-board revision of that first edition – one that was motivated by our teaching and research experience over the years, but also by explicit student observations. The volume now offers an even more comprehensive introduction ‘about and around’ register theory and analysis. The theoretical input has been substantially fleshed out, as well as thoroughly reworked, as have the practical samples of register analysis. Further changes are detailed in the Preface to the new edition. But some things remain the same. Our approach to functional varieties of text is still, as it has always been, unapologetically Hallidayan. Indeed, today we are more than ever convinced that the ideal model for educating our NNS of English to language awareness is his functional grammar (FG, Halliday 1985/1994/2004/2014). The reasons for this are, of course, many. To begin with, with what better tool could we carry on our relentless efforts to explode those die-hard myths that would see the study of grammar as a boring and/or elitist enterprise, even one that is basically meaningless? Indeed, FG sets its sights high: to “observe the humanity of our communication processes, not just their form” (Martin 2010: 1-2, our emphasis), or as Christie puts it, to explore “some of the most important and pervasive of the processes by which human beings build their world” (1985/1989: v). We ultimately aim to guide our students to observing/exploring these processes. And one crucial way to do this is by furnishing them with the tools that FG provides for understanding how language use is not a minor or ‘neutral’ player in the social fields of everyday life (Williams 2016: 339), as well as – why not? – encouraging them to investigate how such awareness can best be put to worthwhile social use. After all, FG is an exceptionally ‘appliable linguistics’ (e.g., Halliday 2002 [2009]: 3), one that successfully challenges the boundaries between theory and practice. And of course, as Halliday insists, “the value of a theory lies in the use that can be made of it” (1985b: 7)
    corecore