34 research outputs found

    Proceedings of the 6th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (ISA-6)

    Get PDF

    Proceedings of the 6th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (ISA-6)

    Get PDF

    Accessing natural history:Discoveries in data cleaning, structuring, and retrieval

    Get PDF

    Dimensions of communication

    Get PDF

    Domains and functions:A two-dimensional account of discourse markers

    Get PDF
    Discourse markers and their functions have been modeled through a large number of very diverse frameworks. Most of these models target written language and the discourse relations which hold between sentences. In this paper, we present, assess and apply a new annotation taxonomy, which targets discourse markers (instead of discourse relations) in spoken language, addressing their polyfunctionality in an alternative way. In particular, its main innovative feature is to distinguish between two independent layers of semantic-pragmatic information (i.e. domains and functions) which, once combined, provide a fine-grained disambiguation of discourse markers. We compare the affordances of this model to existing proposals, and illustrate them with a corpus study. A sample of conversational French containing 423 discourse marker tokens was fully analyzed by two independent annotators. We report on inter-annotator agreement scores, as well as quantitative analyses of the distribution of domains and functions in the sample. Both powerful and economical, this proposal advocates for a flexible and modular approach to discourse analysis, and paves the way for further corpus-based studies on the challenging category of discourse markers.Les marqueurs du discours et leurs fonctions ont fait l’objet de modélisations nombreuses et variées. La plupart de ces modèles portent sur l’écrit et sur les relations discursives entre énoncés. Dans cet article, nous présentons, évaluons et appliquons un nouveau modèle d’annotation qui porte sur les marqueurs du discours (et non sur les relations discursives) à l’oral, offrant une perspective nouvelle sur la polyfonctionnalité des marqueurs. Sa caractéristique la plus innovante est de définir deux couches indépendantes d’information sémantico-pragmatique (c.à.d domaines et fonctions) qui, une fois combinées, fournissent une désambigüisation fine des marqueurs du discours. Nous comparons les apports de ce modèle à d’autres approaches existantes et les illustrons dans une étude de corpus. Un échantillon de français conversationnel contenant 423 marqueurs du discours a été entièrement analysé par deux annotateurs. Nous analysons les scores d’accord inter-annotateurs, ainsi que la distribution des domaines et des fonctions dans l’échantillon. À la fois puissant et économique, ce modèle prône une approche flexible et modulaire de l’analyse du discours, et jette les bases pour de futures études de corpus sur la catégorie complexe des marqueurs du discours

    Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

    Get PDF

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    Wiktionary: The Metalexicographic and the Natural Language Processing Perspective

    Get PDF
    Dictionaries are the main reference works for our understanding of language. They are used by humans and likewise by computational methods. So far, the compilation of dictionaries has almost exclusively been the profession of expert lexicographers. The ease of collaboration on the Web and the rising initiatives of collecting open-licensed knowledge, such as in Wikipedia, caused a new type of dictionary that is voluntarily created by large communities of Web users. This collaborative construction approach presents a new paradigm for lexicography that poses new research questions to dictionary research on the one hand and provides a very valuable knowledge source for natural language processing applications on the other hand. The subject of our research is Wiktionary, which is currently the largest collaboratively constructed dictionary project. In the first part of this thesis, we study Wiktionary from the metalexicographic perspective. Metalexicography is the scientific study of lexicography including the analysis and criticism of dictionaries and lexicographic processes. To this end, we discuss three contributions related to this area of research: (i) We first provide a detailed analysis of Wiktionary and its various language editions and dictionary structures. (ii) We then analyze the collaborative construction process of Wiktionary. Our results show that the traditional phases of the lexicographic process do not apply well to Wiktionary, which is why we propose a novel process description that is based on the frequent and continual revision and discussion of the dictionary articles and the lexicographic instructions. (iii) We perform a large-scale quantitative comparison of Wiktionary and a number of other dictionaries regarding the covered languages, lexical entries, word senses, pragmatic labels, lexical relations, and translations. We conclude the metalexicographic perspective by finding that the collaborative Wiktionary is not an appropriate replacement for expert-built dictionaries due to its inconsistencies, quality flaws, one-fits-all-approach, and strong dependence on expert-built dictionaries. However, Wiktionary's rapid and continual growth, its high coverage of languages, newly coined words, domain-specific vocabulary and non-standard language varieties, as well as the kind of evidence based on the authors' intuition provide promising opportunities for both lexicography and natural language processing. In particular, we find that Wiktionary and expert-built wordnets and thesauri contain largely complementary entries. In the second part of the thesis, we study Wiktionary from the natural language processing perspective with the aim of making available its linguistic knowledge for computational applications. Such applications require vast amounts of structured data with high quality. Expert-built resources have been found to suffer from insufficient coverage and high construction and maintenance cost, whereas fully automatic extraction from corpora or the Web often yields resources of limited quality. Collaboratively built encyclopedias present a viable solution, but do not cover well linguistically oriented knowledge as it is found in dictionaries. That is why we propose extracting linguistic knowledge from Wiktionary, which we achieve by the following three main contributions: (i) We propose the novel multilingual ontology OntoWiktionary that is created by extracting and harmonizing the weakly structured dictionary articles in Wiktionary. A particular challenge in this process is the ambiguity of semantic relations and translations, which we resolve by automatic word sense disambiguation methods. (ii) We automatically align Wiktionary with WordNet 3.0 at the word sense level. The largely complementary information from the two dictionaries yields an aligned resource with higher coverage and an enriched representation of word senses. (iii) We represent Wiktionary according to the ISO standard Lexical Markup Framework, which we adapt to the peculiarities of collaborative dictionaries. This standardized representation is of great importance for fostering the interoperability of resources and hence the dissemination of Wiktionary-based research. To this end, our work presents a foundational step towards the large-scale integrated resource UBY, which facilitates a unified access to a number of standardized dictionaries by means of a shared web interface for human users and an application programming interface for natural language processing applications. A user can, in particular, switch between and combine information from Wiktionary and other dictionaries without completely changing the software. Our final resource and the accompanying datasets and software are publicly available and can be employed for multiple different natural language processing applications. It particularly fills the gap between the small expert-built wordnets and the large amount of encyclopedic knowledge from Wikipedia. We provide a survey of previous works utilizing Wiktionary, and we exemplify the usefulness of our work in two case studies on measuring verb similarity and detecting cross-lingual marketing blunders, which make use of our Wiktionary-based resource and the results of our metalexicographic study. We conclude the thesis by emphasizing the usefulness of collaborative dictionaries when being combined with expert-built resources, which bears much unused potential
    corecore