Search CORE

200 research outputs found

Recommended from our members

The production of prosodic focus and contour in dialogue

Author: Youd Nicholas John
Publication venue
Publication date: 01/01/1993
Field of study

Computer programs designed to converse with humans in natural language provide a framework against which to test supra-sentential theories of language production and interpretation. This thesis seeks to flesh out, in terms of a computer model, two basic assumptions concerning prosody-that speakers use intonation to convey intention, or attitude, and that prosodic prominence serves to convey conceptual prommence. A model of an information-providing agent in is proposed, based on an analysis of a corpus of spontaneous dialogues. This uses an architecture of communicating processes, which perform interpretation, application-specific planning, repair, and the production of output. Dialogue acts are then defined as feature bundles corresponding to significant events. A corpus of read dialogues is analysed in terms of these features, and using conventional intonational labelling. Correlations between the two are examined. Prosodic prominence is examined at three levels. At the level of surface encoding, re-use of substrings and structural parallelism can reduce processing for the speaker, and the listener. At the level of conceptual planning, similar benefits exist, given that speakers and listeners assume a common discourse model wherever possible. At these levels use is made of a short-term buffer of recent forms. A speaker may additionally use contrastive prominence to draw the listener's attention to disparities. Finally, at the level of intentions, a speaker wish to highlight certain information, regardless of accessibility. Prosodic focus is represented relationally, rather than via a simple binary-valued feature. This has the advantage of facilitating the mapping between levels; it also renders straightforward the notion of focus as the product of a number of potentially conflicting influences. Those parts of the theory concerned with discourse representation, language generation, and prosodic focus have been implemented as part of the Sundial dialogue system. In this system, discoursal and pragmatic decisions affecting prosody are converted to annotations on a text string, for realisation by a rule-based synthesizer

Open Research Online (The Open University)

Recommended from our members

Planning multisentential English text using communicative acts

Author: Maybury Mark Thomas
Publication venue: University of Cambridge
Publication date: 12/11/1991
Field of study

The goal of this research is to develop explanation presentation mechanisms for knowledge based systems which enable them to define domain terminology and concepts, narrate events, elucidate plans, processes, or propositions and argue to support a claim or advocate action. This requires the development of devices which select, structure, order and then linguistically realize explanation content as coherent and cohesive English text. With the goal of identifying generic explanation presentation strategies, a wide range of naturally occurring texts were analyzed with respect to their communicative sttucture, function, content and intended effects on the reader. This motivated an integrated theory of communicative acts which characterizes text at the level of rhetorical acts (e.g., describe, define, narrate), illocutionary acts (e.g., inform, request), and locutionary acts (e.g., ask, command). Taken as a whole, the identified communicative acts characterize the structure, content and intended effects of four types of text: description, narration, exposition, argument. These text types have distinct effects such as getting the reader to know about entities, to know about events, to understand plans, processes, or propositions, or to believe propositions or want to perform actions. In addition to identifying the communicative function and effect of text at multiple levels of abstraction, this dissertation details a tripartite theory of focus of attention (discourse focus, temporal focus, and spatial focus) which constrains the planning and linguistic realization of text. To test the integrated theory of communicative acts and tripartite theory of focus of attention, a text generation system TEXPLAN (Textual EXplanation PLANner) was implemented that plans and linguistically realizes multisentential and multiparagraph explanations from knowledge based systems. The communicative acts identified during text analysis were formalized as over sixty compositional and (in some cases) recursive plan operators in the library of a hierarchical planner. Discourse, temporal, and spatial focus models were implemented to track and use attentional information to guide the organization and realization of text. Because the plan operators distinguish between the communicative function (e.g., argue for a proposition) and the expected effect (e.g., the reader believes the proposition) of communicative acts, the system is able to construct a discourse model of the structure and function of its textual responses as well as a user model of the expected effects of its responses on the reader's knowledge, beliefs, and desires. The system uses both the discourse model and user model to guide subsequent utterances. To test its generality, the system was interfaced to a variety of domain applications including a neuropsychological diagnosis system, a mission planning system, and a knowledge based mission simulator. The system produces descriptions, narrations, expositions, and arguments from these applications, thus exhibiting a broader range of rhetorical coverage than previous text generation systems

Apollo (Cambridge)

Ideas Matchmaking for Supporting Innovators and Entrepreneurs

Author: Hambardzumyan Khachatur
Publication venue
Publication date: 01/01/2016
Field of study

Käesolevas töös esitletakse süsteemi, mis on võimeline sirvima veebist ettevõtluse ja tehnoloogiaga seotud andmeid, mida saab siduda kasutajate poolt Innovvoice platvormil välja pakutud ideedega. Selline teenus on ideabator platvormi väärtuslik osa, mis toetab ettevõtluse uuendajaid ja potentsiaalseid ettevõtjaid.In this paper we show a system able to crawl content from the Web related to entrepreneurship and technology, to be matched with ideas proposed by users in the Innovvoice platform. We argue that such a service is a valuable component of an ideabator platform, supporting innovators and possible entrepreneurs

DSpace at Tartu University Library

University of Helsinki Department of Computer Science Annual Report 1998

Author
Publication venue: University of Helsinki, Department of Computer Science
Publication date: 01/01/1999
Field of study

Helsingin yliopiston digitaalinen arkisto

Language technology in multimedia information retrieval:Proceedings of the fourteenth International Twente Workshop on Language Technology

Author
Publication venue: 'University Library/University of Twente'
Publication date: 01/12/1998
Field of study

University of Twente Research Information

Towards Dynamic Composition of Question Answering Pipelines

Author: Singh Kuldeep
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of question answering for user interaction. DBpedia has been the most prominently used knowledge graph in this setting. QA systems implement a pipeline connecting a sequence of QA components for translating an input question into its corresponding formal query (e.g. SPARQL); this query will be executed over a knowledge graph in order to produce the answer of the question. Recent empirical studies have revealed that albeit overall effective, the performance of QA systems and QA components depends heavily on the features of input questions, and not even the combination of the best performing QA systems or individual QA components retrieves complete and correct answers. Furthermore, these QA systems cannot be easily reused, extended, and results cannot be easily reproduced since the systems are mostly implemented in a monolithic fashion, lack standardised interfaces and are often not open source or available as Web services. All these drawbacks of the state of the art that prevents many of these approaches to be employed in real-world applications. In this thesis, we tackle the problem of QA over knowledge graph and propose a generic approach to promote reusability and build question answering systems in a collaborative effort. Firstly, we define qa vocabulary and Qanary methodology to develop an abstraction level on existing QA systems and components. Qanary relies on qa vocabulary to establish guidelines for semantically describing the knowledge exchange between the components of a QA system. We implement a component-based modular framework called "Qanary Ecosystem" utilising the Qanary methodology to integrate several heterogeneous QA components in a single platform. We further present Qaestro framework that provides an approach to semantically describing question answering components and effectively enumerates QA pipelines based on a QA developer requirements. Qaestro provides all valid combinations of available QA components respecting the input-output requirement of each component to build QA pipelines. Finally, we address the scalability of QA components within a framework and propose a novel approach that chooses the best component per task to automatically build QA pipeline for each input question. We implement this model within FRANKENSTEIN, a framework able to select QA components and compose pipelines. FRANKENSTEIN extends Qanary ecosystem and utilises qa vocabulary for data exchange. It has 29 independent QA components implementing five QA tasks resulting 360 unique QA pipelines. Each approach proposed in this thesis (Qanary methodology, Qaestro, and FRANKENSTEIN) is supported by extensive evaluation to demonstrate their effectiveness. Our contributions target a broader research agenda of offering the QA community an efficient way of applying their research to a research field which is driven by many different fields, consequently requiring a collaborative approach to achieve significant progress in the domain of question answering

bonndoc – Der Publikationsserver der Universität Bonn

Natural language processing in CLIME, a multilingual legal advisory system

Author: Boer
Cahill
Chaudri
Evans
Krahmer
LYNNE CAHILL
NEIL TIPPER
none
none
PAUL PIWEK
Piwek
Piwek
Piwek
Piwek
Power
Reape
Reiter
ROGER EVANS
Valente
Voorhees
Winkels
Winkels
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2008
Field of study

This paper describes CLIME, a web-based legal advisory system with a multilingual natural language interface. clime is a 'proof-of-concept' system which answers queries relating to ship-building and ship-operating regulations. Its core knowledge source is a set of such regulations encoded as a conceptual domain model and a set of formalised legal inference rules. The system supports retrieval of regulations via the conceptual model, and assessment of the legality of a situation or activity on a ship according to the legal inference rules. The focus of this paper is on the natural language aspects of the system, which help the user to construct semantically complex queries using wysiwym technology, allow the system to produce extended and cohesive responses and explanations, and support the whole interaction through a hybrid synchronous/asynchronous dialogue structure. Multilinguality (English and French) is viewed simply as interface localisation: the core representations are languageneutral, and the system can present extended or local interactions in either language at any time. The development of clime featured a high degree of client involvement, and the specification, implementation and evaluation of natural language components in this context are also discussed

Crossref

University of Brighton Research Portal

Open Research Online (The Open University)

A Statistical, Grammar-Based Approach to Microplanning

Author: Gardent Claire
Perez-beltrachini Laura
Publication venue: 'MIT Press - Journals'
Publication date: 01/12/2016
Field of study

International audienceWhile there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine grained interactions that arise during micro-planning between aggregation, surface realization and sentence segmentation. In this paper, we propose a hybrid symbolic/statistical approach to jointly model these interactions. Our approach integrates a small handwritten grammar, a statistical hypertagger and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid statistic/symbolic system more fluent than both a template-and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation and surface realization

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

Edinburgh Research Explorer

Semi-automatic acquisition of domain-specific semantic structures.

Author
Publication venue
Publication date: 01/01/2000
Field of study

Siu, Kai-Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 99-106).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Thesis Outline --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Natural Language Understanding --- p.6Chapter 2.1.1 --- Rule-based Approaches --- p.7Chapter 2.1.2 --- Stochastic Approaches --- p.8Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.9Chapter 2.2 --- Grammar Induction --- p.10Chapter 2.2.1 --- Semantic Classification Trees --- p.11Chapter 2.2.2 --- Simulated Annealing --- p.12Chapter 2.2.3 --- Bayesian Grammar Induction --- p.12Chapter 2.2.4 --- Statistical Grammar Induction --- p.13Chapter 2.3 --- Machine Translation --- p.14Chapter 2.3.1 --- Rule-based Approach --- p.15Chapter 2.3.2 --- Statistical Approach --- p.15Chapter 2.3.3 --- Example-based Approach --- p.16Chapter 2.3.4 --- Knowledge-based Approach --- p.16Chapter 2.3.5 --- Evaluation Method --- p.19Chapter 3 --- Semi-Automatic Grammar Induction --- p.20Chapter 3.1 --- Agglomerative Clustering --- p.20Chapter 3.1.1 --- Spatial Clustering --- p.21Chapter 3.1.2 --- Temporal Clustering --- p.24Chapter 3.1.3 --- Free Parameters --- p.26Chapter 3.2 --- Post-processing --- p.27Chapter 3.3 --- Chapter Summary --- p.29Chapter 4 --- Application to the ATIS Domain --- p.30Chapter 4.1 --- The ATIS Domain --- p.30Chapter 4.2 --- Parameters Selection --- p.32Chapter 4.3 --- Unsupervised Grammar Induction --- p.35Chapter 4.4 --- Prior Knowledge Injection --- p.40Chapter 4.5 --- Evaluation --- p.43Chapter 4.5.1 --- Parse Coverage in Understanding --- p.45Chapter 4.5.2 --- Parse Errors --- p.46Chapter 4.5.3 --- Analysis --- p.47Chapter 4.6 --- Chapter Summary --- p.49Chapter 5 --- Portability to Chinese --- p.50Chapter 5.1 --- Corpus Preparation --- p.50Chapter 5.1.1 --- Tokenization --- p.51Chapter 5.2 --- Experiments --- p.52Chapter 5.2.1 --- Unsupervised Grammar Induction --- p.52Chapter 5.2.2 --- Prior Knowledge Injection --- p.56Chapter 5.3 --- Evaluation --- p.58Chapter 5.3.1 --- Parse Coverage in Understanding --- p.59Chapter 5.3.2 --- Parse Errors --- p.60Chapter 5.4 --- Grammar Comparison Across Languages --- p.60Chapter 5.5 --- Chapter Summary --- p.64Chapter 6 --- Bi-directional Machine Translation --- p.65Chapter 6.1 --- Bilingual Dictionary --- p.67Chapter 6.2 --- Concept Alignments --- p.68Chapter 6.3 --- Translation Procedures --- p.73Chapter 6.3.1 --- The Matching Process --- p.74Chapter 6.3.2 --- The Searching Process --- p.76Chapter 6.3.3 --- Heuristics to Aid Translation --- p.81Chapter 6.4 --- Evaluation --- p.82Chapter 6.4.1 --- Coverage --- p.83Chapter 6.4.2 --- Performance --- p.86Chapter 6.5 --- Chapter Summary --- p.89Chapter 7 --- Conclusions --- p.90Chapter 7.1 --- Summary --- p.90Chapter 7.2 --- Future Work --- p.92Chapter 7.2.1 --- Suggested Improvements on Grammar Induction Process --- p.92Chapter 7.2.2 --- Suggested Improvements on Bi-directional Machine Trans- lation --- p.96Chapter 7.2.3 --- Domain Portability --- p.97Chapter 7.3 --- Contributions --- p.97Bibliography --- p.99Chapter A --- Original SQL Queries --- p.107Chapter B --- Induced Grammar --- p.109Chapter C --- Seeded Categories --- p.11

CUHK Digital Repository

Mrežni sintaksno-semantički okvir za izvlačenje leksičkih relacija deterministričkim modelom prirodnog jezika

Author: Orešković Marko
Publication venue: University of Zagreb. Faculty of Organization and Informatics.
Publication date: 15/03/2019
Field of study

Given the extraordinary growth in online documents, methods for automated extraction of semantic relations became popular, and shortly after, became necessary. This thesis proposes a new deterministic language model, with the associated artifact, which acts as an online Syntactic and Semantic Framework (SSF) for the extraction of morphosyntactic and semantic relations. The model covers all fundamental linguistic fields: Morphology (formation, composition, and word paradigms), Lexicography (storing words and their features in network lexicons), Syntax (the composition of words in meaningful parts: phrases, sentences, and pragmatics), and Semantics (determining the meaning of phrases). To achieve this, a new tagging system with more complex structures was developed. Instead of the commonly used vectored systems, this new tagging system uses tree-likeT-structures with hierarchical, grammatical Word of Speech (WOS), and Semantic of Word (SOW) tags. For relations extraction, it was necessary to develop a syntactic (sub)model of language, which ultimately is the foundation for performing semantic analysis. This was achieved by introducing a new ‘O-structure’, which represents the union of WOS/SOW features from T-structures of words and enables the creation of syntagmatic patterns. Such patterns are a powerful mechanism for the extraction of conceptual structures (e.g., metonymies, similes, or metaphors), breaking sentences into main and subordinate clauses, or detection of a sentence’s main construction parts (subject, predicate, and object). Since all program modules are developed as general and generative entities, SSF can be used for any of the Indo-European languages, although validation and network lexicons have been developed for the Croatian language only. The SSF has three types of lexicons (morphs/syllables, words, and multi-word expressions), and the main words lexicon is included in the Global Linguistic Linked Open Data (LLOD) Cloud, allowing interoperability with all other world languages. The SSF model and its artifact represent a complete natural language model which can be used to extract the lexical relations from single sentences, paragraphs, and also from large collections of documents.Pojavom velikoga broja digitalnih dokumenata u okružju virtualnih mreža (interneta i dr.), postali su zanimljivi, a nedugo zatim i nužni, načini identifikacije i strojnoga izvlačenja semantičkih relacija iz (digitalnih) dokumenata (tekstova). U ovome radu predlaže se novi, deterministički jezični model s pripadnim artefaktom (Syntactic and Semantic Framework - SSF), koji će služiti kao mrežni okvir za izvlačenje morfosintaktičkih i semantičkih relacija iz digitalnog teksta, ali i pružati mnoge druge jezikoslovne funkcije. Model pokriva sva temeljna područja jezikoslovlja: morfologiju (tvorbu, sastav i paradigme riječi) s leksikografijom (spremanjem riječi i njihovih značenja u mrežne leksikone), sintaksu (tj. skladnju riječi u cjeline: sintagme, rečenice i pragmatiku) i semantiku (određivanje značenja sintagmi). Da bi se to ostvarilo, bilo je nužno označiti riječ složenijom strukturom, umjesto do sada korištenih vektoriziranih gramatičkih obilježja predložene su nove T-strukture s hijerarhijskim, gramatičkim (Word of Speech - WOS) i semantičkim (Semantic of Word - SOW) tagovima. Da bi se relacije mogle pronalaziti bilo je potrebno osmisliti sintaktički (pod)model jezika, na kojem će se u konačnici graditi i semantička analiza. To je postignuto uvođenjem nove, tzv. O-strukture, koja predstavlja uniju WOS/SOW obilježja iz T-struktura pojedinih riječi i omogućuje stvaranje sintagmatskih uzoraka. Takvi uzorci predstavljaju snažan mehanizam za izvlačenje konceptualnih struktura (npr. metonimija, simila ili metafora), razbijanje zavisnih rečenica ili prepoznavanje rečeničnih dijelova (subjekta, predikata i objekta). S obzirom da su svi programski moduli mrežnog okvira razvijeni kao opći i generativni entiteti, ne postoji nikakav problem korištenje SSF-a za bilo koji od indoeuropskih jezika, premda su provjera njegovog rada i mrežni leksikoni izvedeni za sada samo za hrvatski jezik. Mrežni okvir ima tri vrste leksikona (morphovi/slogovi, riječi i višeriječnice), a glavni leksikon riječi već je uključen u globalni lingvistički oblak povezanih podataka, što znači da je interoperabilnost s drugim jezicima već postignuta. S ovako osmišljenim i realiziranim načinom, SSF model i njegov realizirani artefakt, predstavljaju potpuni model prirodnoga jezika s kojim se mogu izvlačiti leksičke relacije iz pojedinačne rečenice, odlomka, ali i velikog korpusa (eng. big data) podataka

Faculty of Organization and Informatics - Digital Repository

Croatian Digital Dissertations Repository

University of Zagreb Repository