76 research outputs found

    Approximate text generation from non-hierarchical representations in a declarative framework

    Get PDF
    This thesis is on Natural Language Generation. It describes a linguistic realisation system that translates the semantic information encoded in a conceptual graph into an English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) and an approximate matching between semantic structures allows us to investigate a more general version of the sentence generation problem where one is not pre-committed to a choice of the syntactically prominent elements in the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation — we use D-Tree Grammars which stem from work on Tree-Adjoining Grammars. The declarative specification of the mapping between semantics and syntax allows for different processing strategies to be exploited. A number of generation strategies have been considered: a pure topdown strategy and a chart-based generation technique which allows partially successful computations to be reused in other branches of the search space. Having a generator with increased paraphrasing power as a consequence of using non-hierarchical input and approximate matching raises the issue whether certain 'better' paraphrases can be generated before others. We investigate preference-based processing in the context of generation

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Syntax and semantics of adjectives in portuguese analysis and modeling

    Get PDF
    Tese de doutoramento, LinguĂ­stica (LinguĂ­stica Computacional), Universidade de Lisboa, Faculdade de Letras, 2010DisponĂ­vel no documentoFundação para a CiĂȘncia e Tecnologia (SFRH/BD/8524/2002

    Shell nouns : in a systemic functional linguistics perspective

    Get PDF
    Tese de doutoramento, Linguística (Análise do Discurso), Universidade de Lisboa, Faculdade de Letras, 2015Shell nouns in a Systemic Functional Linguistics perspective. The aim of this thesis is to develop an account of shell nouns (Schmid, 2000) in a Systemic Functional Linguistics (SFL) perspective. Using a parallel corpus comprising five article submissions by Portuguese academics in the field of economics and five published articles on comparable topics, the ideational, interpersonal and textual functions of shell nouns are tagged at the strata of the lexicogrammar and discourse semantics using Corpus Tool version 2.7.4 (O’Donnell, 2008). The systems networks used to tag the corpus are grounded in SFL theory. The analysis shows that shell nouns constitute an important systemic resource for the writers of research articles, who need to build an argument, positioning themselves and their study to convince the discourse community that their paper makes a contribution to knowledge in their disciplinary field. They enable a text to unfold by compacting information realised as a clause or more elsewhere in the text. Thus they can help scaffold a text through hyper-Themes, hyper-News and internal conjunction. At the stratum of the lexicogrammar, anaphorically referring nominal groups with a shell noun as Head often compose Theme, where they constitute a shared point of departure for the clause. In a decoding relational clause whose Process is realised by a verb such as reveal, confirm, or suggest, an anaphorically referring shell noun that construes Token helps to explicitly build the writer’s argument. Shell nouns that construe the field of research, such as results and findings are common in this function. Mental, linguistic and factual shell nouns contribute to construing dialogic position, and coupling between interpersonal systems and textual systems enables the writer to align the reader with certain positions and disalign with others. Although most shell nouns are not field specific, because they can project a figure that instantiates an entity, they contribute to construing field, for example instantiating entities as the object of study of the empirical research. The capacity of shell nouns to function as described above derives from their status as semiotic abstractions, which can refer to text as fact or report and are grammatical metaphors. They can be seen as lying at the intersection of modality and the logico-semantic relations of projection and expansion, brought into being by the semogenic process of nominalisation. The writers of the published articles and article submissions are found to use shell nouns in all of the functions above, but there are differences in the relative shares of the functions, which may affect reader reactions to the text

    The expression of “collectivity” in Romance languages

    Get PDF
    The empirical and comparative study examines various facets of the linguistic expression of collectivity in Romance languages. Against the background of an onomasiological conception of nominal aspectuality, collection nouns are analysed with regard to their morphosyntactic and derivational properties, as well as to their diachronic paths of evolution. Special focus is laid on uncountable object mass nouns

    Morphosyntax of Katcha nominals: a Dynamic Syntax account

    Get PDF
    This thesis presents a new description and theoretical analysis of the nominal system of Katcha (Nilo-Saharan, Kadu), spoken in the Nuba Mountains of Sudan. The description and analysis are based on a synthesis of data from several sources, including unpublished archive material and original fieldwork. The study is placed in context with a discussion of the demographic, cultural and political background affecting the Katcha linguistic community, a review of the current state of linguistic research on Katcha and a discussion of the ongoing controversy over the place of the Kadu languages within the language phyla of Africa. The morphosyntactic descriptions first focus on the role of nominals as heads, considering phenomena such as classification, agreement and modification. It is shown that Katcha has a unusual system of gender agreement with three agreement classes based on the concepts of Masculine, Feminine and Plural and that the gender of a noun may change between its singular and plural forms. Surprisingly, these phenomena are both most commonly found in Afro-Asiatic, which is not a phylum to which Kadu has previously been ascribed. The gender changes are shown to be predictable, determined by number-marking affixes. The study then gives a unified analysis of various types of nominal modifiers; relative clauses, possessives, demonstratives and adjectives all display similar morphological properties and this is accounted for by analysing all modfiers as appositional, headed by a demonstrative pronoun. This analysis of modifiers shows them to be related to, though not the same as, the notions of relative markers and construct state found widely in African languages. The role of nominals within sentential argument structure is then considered, with discussion of phenomena such as prepositional phrases, case and verbal valency. From the interaction of prepositions and pronouns, it is tentatively concluded that Katcha has three cases: Nominative, Accusative and Oblique. From the interaction of verbs and nouns, it is demonstrated that the verbal suffixes known as ‘verb extensions’ primarily serve to license the absence of otherwise mandatory core arguments. The second part of the thesis provides a theoretical analysis of the nominal system within the framework of Dynamic Syntax (DS). Two key features of the DS formalism come into play. Firstly, DS construes semantic individuals as terms of the epsilon calculus. Verb extensions are analysed as projecting context-dependent epsilon terms, providing a value for the ‘missing’ argument. Secondly, DS allows information sharing between propositions by means of a ‘LINK’ relation. Prepositional phrases are analysed as projecting a subordinate proposition which shares an argument with the matrix tree. These two formal tools come together in the analysis of nominal modifiers, which are construed as projecting an arbitrarily complex epsilon term LINKed to some term in the matrix tree, directly reflecting their descriptive analysis as appositional nominals. In presenting new data for a little studied language, this thesis adds to our knowledge and understanding of Nuba Mountain languages. In describing and analysing some of the typologically unsual features of Katcha’s nominal system, it challenges some standard assumptions about these constructions and about the genetic affiliation of the Kadu family. And in the theoretical analysis it demonstrates the suitability of Dynamic Syntax to model some of the key insights of the descriptive analysis

    Multiword expressions

    Get PDF
    Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar

    Formal Linguistic Models and Knowledge Processing. A Structuralist Approach to Rule-Based Ontology Learning and Population

    Get PDF
    2013 - 2014The main aim of this research is to propose a structuralist approach for knowledge processing by means of ontology learning and population, achieved starting from unstructured and structured texts. The method suggested includes distributional semantic approaches and NL formalization theories, in order to develop a framework, which relies upon deep linguistic analysis... [edited by author]XIII n.s

    Designing Statistical Language Learners: Experiments on Noun Compounds

    Full text link
    The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: (i) it identifies a new class of designs by specifying an architecture for natural language analysis in which probabilities are given to semantic forms rather than to more superficial linguistic elements; and (ii) it explores the development of a mathematical theory to predict the expected accuracy of statistical language learning systems in terms of the volume of data used to train them. The theoretical work is illustrated by applying statistical language learning designs to the analysis of noun compounds. Both syntactic and semantic analysis of noun compounds are attempted using the proposed architecture. Empirical comparisons demonstrate that the proposed syntactic model is significantly better than those previously suggested, approaching the performance of human judges on the same task, and that the proposed semantic model, the first statistical approach to this problem, exhibits significantly better accuracy than the baseline strategy. These results suggest that the new class of designs identified is a promising one. The experiments also serve to highlight the need for a widely applicable theory of data requirements.Comment: PhD thesis (Macquarie University, Sydney; December 1995), LaTeX source, xii+214 page

    Exploring the use of parallel corpora in the complilation of specialised bilingual dictionaries of technical terms: a case study of English and isiXhosa

    Get PDF
    Text in EnglishAbstracts in English, isiXhosa and AfrikaansThe Constitution of the Republic of South Africa, Act 108 of 1996, mandates the state to take practical and positive measures to elevate the status and the use of indigenous languages. The implementation of this pronouncement resulted in a growing demand for specialised translations in fields like technology, science, commerce, law and finance. The lack of terminology and resources such as specialised bilingual dictionaries in indigenous languages, particularly isiXhosa remains a growing concern that hinders the translation and the intellectualisation of isiXhosa. A growing number of African scholars affirm the importance of specialised dictionaries in the African languages as tools for language and terminology development so that African languages can be used in the areas of science and technology. In the light of the background above, this study explored how parallel corpora can be interrogated using a bilingual concordancer, ParaConc to extract bilingual terminology that can be used to create specialised bilingual dictionaries. A corpus-based approach was selected due to its speed, efficiency and accuracy in extracting bilingual terms in their immediate contexts. In enhancing the research outcomes, Descriptive Translations Studies (DTS) and Corpus-based translation studies (CTS) were used in a complementary manner. Because the study is interdisciplinary, the function theories of lexicography that emphasise the function and needs of users were also applied. The analysis and extraction of bilingual terminology for dictionary making was successful through the use of the following ParaConc features, namely frequencies, hot word lists, hot words, search facility and concordances (Key Word in Context), among others. The findings revealed that English-isiXhosa Parallel Corpus is a repository of translation equivalents and other information categories that can make specialised dictionaries more user-friendly and multifunctional. The frequency lists were revealed as an effective method of selecting headwords for inclusion in a dictionary. The results also unraveled the complex functions of bilingual concordances where information on collocations and multiword units, sense distinction and usage examples could be easily identifiable proving that this approach is more efficient than the traditional method. The study contributes to the knowledge on corpus-based lexicography, standardisation of finance terminology resource development and making of user-friendly dictionaries that are tailor-made for different needs of users.Umgaqo-siseko weli loMzantsi Afrika ukhululele uRhulumente ukuba athabathe amanyathelo abonakalayo ekuphuhliseni nasekuphuculeni iilwimi zesiNtu. Esi sindululo sibangele ukwanda kokuguqulelwa kwamaxwebhu angezobuchwepheshe, inzululwazi, umthetho, ezemali noqoqosho angesiNgesi eguqulelwa kwiilwimi ebezifudula zingasiwe-so ezinjengesiXhosa. Ukunqongophala kwesigama kunye nezichazi-magama kube yingxaki enkulu ekuguquleleni ngakumbi izichazi-magama ezilwimi-mbini eziqulethe isigama esikhethekileyo. Iingcali ezininzi ziyangqinelana ukuba olu hlobo lwezi zichazi-magama luyimfuneko kuba ludlala iindima enkulu ekuphuhlisweni kweelwimi zesiNtu, ekuyileni isigama, nasekusetyenzisweni kwazo kumabakala obunzululwazi nobuchwepheshe. Olu phando ke luvavanya ukusetyenziswa kwekhophasi equlethe amaxwebhu esiNgesi neenguqulelo zawo zesiXhosa njengovimba wokudimbaza isigama sezemali esinokunceda ekuqulunqweni kwesichazi-magama esilwimi-mbini. Isizathu esibangele ukukhetha le ndlela yophando esebenzisa ikhompyutha kukuba iyakhawuleza, ulwazi oluthathwe kwikhophasi luchanekile, yaye isigama kwikhophasi singqamana ngqo nomxholo wamaxwebhu nto leyo eyenza kube lula ukufumana iintsingiselo nemizekelo ephilayo. Ukutyebisa olu phando indlela yekhophasi iye yaxhaswa zezinye iindlela zophando ezityunjiweyo: ufundo lwenguguqulelo oluchazayo (DTS) kunye neendlela zokuguqulela ezijoliswe kumsebenzi nakuhlobo lwabasebenzisi zinguqulelo ezo. Kanti ke ziqwalaselwe neenkqubo zophando lobhalo-zichazi-magama eziinjongo zokuqulunqa izichazi-magama ezesebenzisekayo neziluncedo kuninzi lwabasebenzisi zichazi-magama ngakumbi kwisizwe esisebenzisa iilwimi ezininzi. Ukuhlalutya nokudimbaza isigama kwikhophasi kolu phando kusetyenziswe isixhobo sekhompyutha esilungiselelwe ikhophasi enelwiimi ezimbini nangaphezulu ebizwa ngokuba yiParaConc. Iziphumo zolu phando zibonise mhlophe ukuba ikhophasi eneenguqulelo nguvimba weendidi ngendidi zamagama nolwazi olunokuphucula izichazi-magama zeli xesha. Kaloku abaguquleli basebenzise amaqhinga ngamaqhinga ukunika iinguqulelo bekhokelwa yimigomo nemithetho yoguqulelo enxuse abasebenzisi bamaxwebhu aguqulelweyo. Ubuchule beParaConc bokukwazi ukuhlela amagama ngokwendlela afumaneka ngayo kunye neenkcukacha zamanani budandalazise indlela eyiyo yokukhetha imichazwa enokungena kwisichazi-magama. Iziphumo zikwabonakalise iintlaninge yolwazi olufumaneka kwiKWIC, lwazi olo olungelula ukulufumana xa usebenzisa undlela-ndala wokwakha isichazi-magama. Esi sifundo esihlanganyele uGuqulelo olusekelwe kwiKhophasi noQulunqo-zichazi-magama zobuchwepheshe luya kuba negalelo elingathethekiyo kwindlela yokwakha izichazi-magama kwilwiimi zeSintu ngokubanzi nancakasana kwisiXhosa, nto leyo eya kothula umthwalo kubaqulunqi-zichazi-magama. Ukwakha nokuqulunqa izichazi-magama ezilwimi-mbini zezemali kuya kwandisa imithombo yesigama esinqongopheleyo kananjalo sivelise izichazi-magama eziluncedo kwisininzi sabantu.Die Grondwet van die Republiek van Suid-Afrika, Wet 108 van 1996, gee aan die staat die mandaat om praktiese en positiewe maatreĂ«ls te tref om die status en gebruik van inheemse tale te verhoog. Die implementering van hierdie uitspraak het gelei tot ’n toenemende vraag na gespesialiseerde vertalings in domeine soos tegnologie, wetenskap, handel, regte en finansies. Die gebrek aan terminologie en hulpbronne soos gespesialiseerde woordeboeke in inheemse tale, veral Xhosa, wek toenemende kommer wat die vertaling en die intellektualisering van Xhosa belemmer. ’n Toenemende aantal vakkundiges in Afrika beklemtoon die belangrikheid van gespesialiseerde woordeboeke in die Afrikatale as instrumente vir taal- en terminologie-ontwikkeling sodat Afrikatale gebruik kan word in die areas van wetenskap en tegnologie. In die lig van die voorafgaande agtergrond het hierdie studie ondersoek ingestel na hoe parallelle korpora deursoek kan word deur ’n tweetalige konkordanser (ParaConc) te gebruik om tweetalige terminologie te ontgin wat gebruik kan word in die onwikkeling van tweetalige gespesialiseerde woordeboeke. ’n Korpusgebaseerde benadering is gekies vir die spoed, doeltreffendheid en akkuraatheid waarmee dit tweetalige terme uit hulle onmiddellike kontekste kan onttrek. Beskrywende Vertaalstudies (DTS) en Korpusgebaseerde Vertaalstudies (CTS) is op ’n aanvullende wyse gebruik om die navorsingsuitkomste te verbeter. Aangesien die studie interdissiplinĂȘr is, is die funksieteorieĂ« van leksikografie wat die funksie en behoeftes van gebruikers beklemtoon, ook toegepas. Die analise en ontginning van tweetalige terminologie om woordeboeke te ontwikkel was suksesvol deur, onder andere, gebruik te maak van die volgende ParaConc-eienskappe, naamlik, frekwensies, hotword-lyste, hot words, die soekfunksie en konkordansies (Sleutelwoord-in-Konteks). Die bevindings toon dat ’n Engels-Xhosa Parallelle Korpus ’n bron van vertaalekwivalente en ander inligtingskategorieĂ« is wat gespesialiseerde woordeboeke meer gebruikersvriendelik en multifunksioneel kan maak. Die frekwensielyste is geĂŻdentifiseer as ’n doeltreffende metode om hoofwoorde te selekteer wat opgeneem kan word in ’n woordeboek. Die bevindings het ook die komplekse funksies van tweetalige konkordansers ontknoop waar inligting oor kollokasies en veelvuldigewoord-eenhede, betekenisonderskeiding en gebruiksvoorbeelde maklik identifiseer kon word wat aandui dat hierdie metode viii doeltreffender is as die tradisionele metode. Die studie dra by tot die kennisveld van korpusgebaseerde leksikografie, standaardisering van finansiĂ«le terminologie, hulpbronontwikkeling en die ontwikkeling van gebruikersvriendelike woordeboeke wat doelgemaak is vir verskillende behoeftes van gebruikers.Linguistics and Modern LanguagesD. Litt. et Phil. (Linguistics (Translation Studies)
    • 

    corecore