79 research outputs found
Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines
Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF
Deep Learning-Based Knowledge Injection for Metaphor Detection: A Comprehensive Review
The history of metaphor research also marks the evolution of knowledge
infusion research. With the continued advancement of deep learning techniques
in recent years, the natural language processing community has shown great
interest in applying knowledge to successful results in metaphor recognition
tasks. Although there has been a gradual increase in the number of approaches
involving knowledge injection in the field of metaphor recognition, there is a
lack of a complete review article on knowledge injection based approaches.
Therefore, the goal of this paper is to provide a comprehensive review of
research advances in the application of deep learning for knowledge injection
in metaphor recognition tasks. In this paper, we systematically summarize and
generalize the mainstream knowledge and knowledge injection principles, as well
as review the datasets, evaluation metrics, and benchmark models used in
metaphor recognition tasks. Finally, we explore the current issues facing
knowledge injection methods and provide an outlook on future research
directions.Comment: 15 page
The EAGLES/ISLE initiative for setting standards: the Computational Lexicon Working Group for Multilingual Lexicons
ISLE (International Standards for Language Engineering), a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme, is a continuation of the long standing EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, carried out by European and American groups within the EU-US International Research Co-operation, supported by NSF and EC. The objective is to support HLT R&D international and national projects, and HLT industry, by developing and promoting widely agreed and urgently demanded HLT standards and guidelines for infrastructural language resources, tools, and HLT products. ISLE targets the areas of multilingual computational lexicons (MCL), natural interaction and multimodality (NIMM), and evaluation. For MCL, ISLE is working to: extend EAGLES work on lexical semantics, necessary to establish inter-language links; design standards for multilingual lexicons; develop a prototype tool to implement lexicon guidelines; create EAGLES-conformant sample lexicons and tag corpora for validation purposes; develop standardised evaluation procedures for lexicons. For NIMM, a rapidly innovating domain urgently requiring early standardisation, ISLE work is targeted to develop guidelines for: creation of NIMM data resources; interpretative annotation of NIMM data, including spoken dialogue; annotation of discourse phenomena. For evaluation, ISLE is working on: quality models for machine translation systems; maintenance of previous guidelines - in an ISO based framework. We concentrate in the paper on the Computational Lexicon Working Group, describing in detail the proposals of guidelines for the "Multilingual ISLE Lexical Entry" (MILE). We highlight some methodological principles applied in previous EAGLES, and followed in defining MILE. We also provide a description of the EU SIMPLE semantic lexicons built on the basis of previous EAGLES recommendations. Their importance is given by the fact that these lexicons are now enlarged to real-size lexicons within National Projects in 8 EU countries, thus building a really large infrastructural platform of harmonised lexicons in Europe. We will stress the relevance of standardised language resources also for the humanities applications. Numerous theories, approaches, systems are taken into account in ISLE, as any recommendation for harmonisation must build on the major contemporary approaches. Results will be widely disseminated, after validation in collaboration with EU and US HLT R&D projects, and industry. EAGLES work towards de facto standards has already allowed the field of Language Resources to establish broad consensus on key issues for some well-established areas - and will allow similar consensus to be achieved for other important areas through the ISLE project - providing thus a key opportunity for further consolidation and a basis for technological advance. EAGLES previous results in many areas have in fact already become de facto widely adopted standards, and EAGLES itself is a well-known trademark and a point of reference for HLT projects.Hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
Report on first selection of resources
The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin
Developing a large scale FrameNet for Italian - The IFrameNet experience
In this thesis we present the development and the current status of the IFrameNet project, aimed at the construction of a large-scale lexical semantic resource for the Italian language based on Frame Semantics theories. We will begin by contextualizing our work in the wider context of Frame Semantics and of the FrameNet project, which, since 1997, has attempted to apply these theories to lexicography. We will then analyse and discuss the applicability of the structure of the American resource to Italian and more specifically we will focus on the domain of fear, worry, and anxiety. We will finally propose some modifications aimed at improving this domain of the resource in relation to its coherence, its ability to accurately represent the linguistic reality and in particular in order to make it possible to apply it to Italian
- …