11 research outputs found

    Application of fuzzy sets in data-to-text system

    Get PDF
    This PhD dissertation addresses the convergence of two distinct paradigms: fuzzy sets and natural language generation. The object of study is the integration of fuzzy set-derived techniques that model imprecision and uncertainty in human language into systems that generate textual information from numeric data, commonly known as data-to-text systems. This dissertation covers an extensive state of the art review, potential convergence points, two real data-to-text applications that integrate fuzzy sets (in the meteorology and learning analytics domains), and a model that encompasses the most relevant elements in the linguistic description of data discipline and provides a framework for building and integrating fuzzy set-based approaches into natural language generation/data-to-ext systems

    A Knowledge Multidimensional Representation Model for Automatic Text Analysis and Generation: Applications for Cultural Heritage

    Get PDF
    Knowledge is information that has been contextualized in a certain domain, where it can be used and applied. Natural Language provides a most direct way to transfer knowledge at different levels of conceptual density. The opportunity provided by the evolution of the technologies of Natural Language Processing is thus of making more fluid and universal the process of knowledge transfer. Indeed, unfolding domain knowledge is one way to bring to larger audiences contents that would be otherwise restricted to specialists. This has been done so far in a totally manual way through the skills of divulgators and popular science writers. Technology provides now a way to make this transfer both less expensive and more widespread. Extracting knowledge and then generating from it suitably communicable text in natural language are the two related subtasks that need be fulfilled in order to attain the general goal. To this aim, two fields from information technology have achieved the needed maturity and can therefore be effectively combined. In fact, on the one hand Information Extraction and Retrieval (IER) can extract knowledge from texts and map it into a neutral, abstract form, hence liberating it from the stylistic constraints into which it was originated. From there, Natural Language Generation can take charge, by regenerating automatically, or semi-automatically, the extracted knowledge into texts targeting new communities. This doctoral thesis provides a contribution to making substantial this combination through the definition and implementation of a novel multidimensional model for the representation of conceptual knowledge and of a workflow that can produce strongly customized textual descriptions. By exploiting techniques for the generation of paraphrases and by profiling target users, applications and domains, a target-driven approach is proposed to automatically generate multiple texts from the same information core. An extended case study is described to demonstrate the effectiveness of the proposed model and approach in the Cultural Heritage application domain, so as to compare and position this contribution within the current state of the art and to outline future directions

    Low Resources Machine Translation

    Get PDF
    METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use ‘basic’ linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their ‘home’ languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions

    Entity Coherence for Descriptive Text Structuring

    Get PDF
    Institute for Communicating and Collaborative SystemsAlthough entity coherence, i.e. the coherence that arises from certain patterns of references to entities, is of attested importance for characterising a descriptive text structure, whether and how current formal models of entity coherence such as Centering Theory can be used for the purposes of natural language generation remains unclear. This thesis investigates this issue and sets out to explore which of the many formulations of Centering best suits text structuring. In doing this, we assume text structuring to be a search task where different orderings of propositions are evaluated according to scores assigned by a metric. The main question behind this study is how to choose a metric of entity coherence among many alternatives as the only guidance to the text structuring component of a system that produces descriptions of objects. Different ways of defining metrics of entity coherence using Centering’s notions are discussed and a general corpus-based methodology is introduced to identify which of these metrics constitute the most promising candidates for search-based text structuring before the actual generation of the descriptive structure takes place. The performance of a large set of metrics is estimated empirically in a series of computational experiments using two kinds of data: (i) a reliably annotated corpus representing the genre of interest and (ii) data derived from an existing natural language generation system and ordered according to the instructions of a domain expert. A final experiment supplements our main methodology by automatically evaluating the best scoring orderings of some of the best performing metrics in comparison to an upper bound defined by orderings produced by multiple experts on additional application-specific data and a lower bound defined by a random baseline. The main findings are summarised as follows: In general, the simplest metric of entity coherence constitutes a very robust baseline for both datasets. However, when the metrics are modified according to an additional constraint on entity coherence, then the baseline is beaten in domain (ii). The employed modification is supported by the subsidiary evaluation which renders all employed metrics superior to the random baseline and helps identify the metric which overall constitutes the most suitable candidate (among the ones investigated) for search-based descriptive text structuring in domain (ii). This thesis provides substantial insight into the role of entity coherence as a descriptive text structuring constraint. Viewing Centering from an NLG perspective raises a series of interesting challenges that the thesis identifies and attempts to investigate to a certain extent. The general evaluation methodology and the results of the empirical studies are useful for any subsequent attempt to generate a descriptive text structure in the context of an application that makes use of the notion of entity coherence as modelled by Centering

    Adaptive hypertext and hypermedia : proceedings of the 2nd workshop, Pittsburgh, Pa., June 20-24, 1998

    Get PDF

    Adaptive hypertext and hypermedia : proceedings of the 2nd workshop, Pittsburgh, Pa., June 20-24, 1998

    Get PDF

    Hybrid discourse modeling and summarization for a speech-to-speech translation system

    Get PDF
    The thesis discusses two parts of the speech-to-speech translation system VerbMobil: the dialogue model and one of its applications, multilingual summary generation. In connection with the dialogue model, two topics are of special interest: (a) the use of a default unification operation called overlay as the fundamental operation for dialogue management; and (b) an intentional model that is able to describe intentions in dialogue on five levels in a language-independent way. Besides the actual generation algorithm developed, we present a comprehensive evaluation of the summarization functionality. In addition to precision and recall, a new characterization - confabulation - is defined that provides a more precise understanding of the performance of complex natural language processing systems.Die vorliegende Arbeit behandelt hauptsĂ€chlich zwei Themen, die fĂŒr das VerbMobil-System, ein Übersetzungssystem gesprochener Spontansprache, entwickelt wurden: das Dialogmodell und als Applikation die multilinguale Generierung von Ergebnissprotokollen. FĂŒr die Dialogmodellierung sind zwei Themen von besonderem Interesse. Das erste behandelt eine in der vorliegenden Arbeit formalisierte Default-Unifikations-Operation namens Overlay, die als fundamentale Operation fĂŒr Diskursverarbeitung dient. Das zweite besteht aus einem intentionalen Modell, das Intentionen eines Dialogs auf fĂŒnf Ebenen in einer sprachunabhĂ€ngigen ReprĂ€sentation darstellt. Neben dem fĂŒr die Protokollgenerierung entwickelten Generierungsalgorithmus wird eine umfassende Evaluation zur ProtokollgenerierungsfunktionalitĂ€t vorgestellt. ZusĂ€tzlich zu "precision" und "recall" wird ein neues Maß - Konfabulation (Engl.: "confabulation") - vorgestellt, das eine prĂ€zisere Charakterisierung der QualitĂ€t eines komplexen Sprachverarbeitungssystems ermöglicht

    Multimodal Reference

    Get PDF

    Hybrid discourse modeling and summarization for a speech-to-speech translation system

    Get PDF
    The thesis discusses two parts of the speech-to-speech translation system VerbMobil: the dialogue model and one of its applications, multilingual summary generation. In connection with the dialogue model, two topics are of special interest: (a) the use of a default unification operation called overlay as the fundamental operation for dialogue management; and (b) an intentional model that is able to describe intentions in dialogue on five levels in a language-independent way. Besides the actual generation algorithm developed, we present a comprehensive evaluation of the summarization functionality. In addition to precision and recall, a new characterization - confabulation - is defined that provides a more precise understanding of the performance of complex natural language processing systems.Die vorliegende Arbeit behandelt hauptsĂ€chlich zwei Themen, die fĂŒr das VerbMobil-System, ein Übersetzungssystem gesprochener Spontansprache, entwickelt wurden: das Dialogmodell und als Applikation die multilinguale Generierung von Ergebnissprotokollen. FĂŒr die Dialogmodellierung sind zwei Themen von besonderem Interesse. Das erste behandelt eine in der vorliegenden Arbeit formalisierte Default-Unifikations-Operation namens Overlay, die als fundamentale Operation fĂŒr Diskursverarbeitung dient. Das zweite besteht aus einem intentionalen Modell, das Intentionen eines Dialogs auf fĂŒnf Ebenen in einer sprachunabhĂ€ngigen ReprĂ€sentation darstellt. Neben dem fĂŒr die Protokollgenerierung entwickelten Generierungsalgorithmus wird eine umfassende Evaluation zur ProtokollgenerierungsfunktionalitĂ€t vorgestellt. ZusĂ€tzlich zu "precision" und "recall" wird ein neues Maß - Konfabulation (Engl.: "confabulation") - vorgestellt, das eine prĂ€zisere Charakterisierung der QualitĂ€t eines komplexen Sprachverarbeitungssystems ermöglicht

    Modelling aggregation motivated interactions in descriptive text generation

    Get PDF
    corecore