56 research outputs found

    A corpus-based contrastive analysis of modal adverbs of certainty in English and Urdu

    Get PDF
    This study uses the corpus-based contrastive approach to explore the syntactic patterns and semantic and pragmatic meanings of modal adverbs of certainty (MACs) in English and Urdu. MACs are a descriptive category of epistemic modal adverb that semantically express a degree of certainty. Due to the paucity of research to date on Urdu MACs, the study draws on existing literature on English MACs for cross-linguistic description of characteristics of English and Urdu MACs. A framework is constructed based on Boye’s (2012) description of syntactic characteristics of MACs, in terms of clause type and position within the clause; and on Simon-Vandenbergen and Aijmer’s (2007) description of their functional characteristics including both semantic (e.g. certainty, possibility) and pragmatic (e.g. authority, politeness) functions. Following Boye’s (2012) model, MACs may be grouped according to meaning: high certainty support – HCS (e.g. certainly); probability support – PS (e.g. perhaps); probability support for negative content – PSNC (e.g. perhaps not); and high certainty support for negative content – HCSNC (e.g. certainly not). Methodologically, the framework identified as suitable is one that primarily follows earlier studies that relied on corpus-based methods and parallel and comparable corpora for cross-linguistic comparative or contrastive analysis of some linguistic element or pattern. An approach to grammatical description based on such works as Quirk et al. (1985) and Biber et al. (1999) is likewise identified as suitable for this study. An existing parallel corpus (EMILLE) and newly created comparable monolingual corpora of English and Urdu are utilised. The novel comparable corpora are web-based, comprised of news and chat forum texts; the data is POS-tagged. Using the parallel corpus, Urdu MACs equivalent to the English MACs preidentified from the existing literature are identified. Then, the comparable corpora are used to extract data on the relative frequencies of MACs and their distribution across various text types. This quantitative analysis demonstrates that in both languages all four semantic categories of MAC are found in all text types, but the distribution across text types is not uniform. HCS MACs, although diverse, are considerably lower in frequency than PS MACs in both English and Urdu. HCSNC and PSNC MACs are notably rarer than HCS and PS MACs in both languages. The analysis demonstrates striking similarities in the syntactic positioning of MACs in English and Urdu, with minor differences. Except for Urdu PSNC MACs, all categories most frequently occur in clause medial position, in both independent and dependent clauses, in both languages. This difference is because hō nahīṁ saktā ‘possibly not’ is most frequent in clause final position. MACs in both languages most often have scope over the whole clause in which they occur; semantically, the core function of MACs is to express speaker’s certainty and high confidence (for HCS and HCSNC) or low certainty and low confidence (for PS and PSNC) in the truth of a proposition. These groups thus primarily function as certainty markers and probability markers, respectively. In both languages, speakers also use MACs short responses to questions, and in responses to their own rhetorical questions. HCS and PS MACs in clause final position may in addition function as tags which prompt a response from the interlocutor. When they cooccur with modal verbs, MACs emphasise or downtone, but do not entirely change, the modal verb’s epistemic or deontic meaning. In both languages, all MACs preferentially occur in the then-clause of a conditional sentence. Pragmatically, MACs are used for emphasis, expectation, counter-expectation and politeness. Additionally, HCS and HCSNC MACs are used to express solidarity and authority, and PS and PSNC MACs are used as hedges. Readings of expectation, hedge, politeness, and solidarity may be relevant simultaneously. Interestingly, reduplication for emphasis, common in Urdu, is only observed for one Urdu MAC, żarūr ‘definitely’, whereas all English MACs reduplicate for emphasis in at least some cases. Another difference is that, in Urdu, the sequence śāyad nahīṁ yaqīnān ‘not perhaps, certainly’ expresses speaker authority within a response to a previous speaker, but no English MAC exhibits this behaviour. Despite overall similarity, minor dissimilarities in the use of English and Urdu MACs are observable, in the use of MACs as replies to questions, and in their use within interrogative clauses. This analysis supports the contention that, cross-linguistically, despite linguistic variation, the conceptual structures and functional-communicative considerations that shape natural languages are largely universal. This study makes two main contributions. First, conducting a descriptive analysis of English and Urdu MACs using a corpus-based contrastive method both illuminates this specific question in modality but also sets a precedent for future corpus-based descriptive studies of Urdu. The second is its inclusion of priorly considered distinct categories of modal adverbs of certainty and possibility in a single category of modal adverbs that are used to express a degree of certainty, i.e. MACs. From the practical standpoint, an additional contribution of this study is the creation and open release of a large Urdu corpus designed for comparable corpus research, the Lancaster Urdu Web Corpus, fulfilling a need for such a corpus in the field

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    Studies in the linguistic sciences. 17-18 (1987-1988)

    Get PDF

    Between syntax and morphology

    Get PDF
    Synopsis: This volume collects novel contributions to comparative generative linguistics that “rethink” existing approaches to an extensive range of phenomena, domains, and architectural questions in linguistic theory. At the heart of the contributions is the tension between descriptive and explanatory adequacy which has long animated generative linguistics and which continues to grow thanks to the increasing amount and diversity of data available to us. The chapters address research questions in comparative morphosyntax, including the modelling of syntactic categories, relative clauses, and demonstrative systems. Many of these contributions show the influence of research by Ian Roberts and collaborators and give the reader a sense of the lively nature of current discussion of topics in morphosyntax and morphosyntactic variation. This book is complemented by volume I available at https://langsci-press.org/catalog/book/275 and volume III available at https://langsci-press.org/catalog/book/277

    Syntactic architecture and its consequences II

    Get PDF
    This volume collects novel contributions to comparative generative linguistics that “rethink” existing approaches to an extensive range of phenomena, domains, and architectural questions in linguistic theory. At the heart of the contributions is the tension between descriptive and explanatory adequacy which has long animated generative linguistics and which continues to grow thanks to the increasing amount and diversity of data available to us. The chapters address research questions in comparative morphosyntax, including the modelling of syntactic categories, relative clauses, and demonstrative systems. Many of these contributions show the influence of research by Ian Roberts and collaborators and give the reader a sense of the lively nature of current discussion of topics in morphosyntax and morphosyntactic variation. This book is complemented by volume I available at https://langsci-press.org/catalog/book/275 and volume III available at https://langsci-press.org/catalog/book/277

    Perspectives on information structure in Austronesian languages

    Get PDF
    Information structure is a relatively new field to linguistics and has only recently been studied for smaller and less described languages. This book is the first of its kind that brings together contributions on information structure in Austronesian languages. Current approaches from formal semantics, discourse studies, and intonational phonology are brought together with language specific and cross-linguistic expertise of Austronesian languages. The 13 chapters in this volume cover all subgroups of the large Austronesian family, including Formosan, Central Malayo-Polynesian, South Halmahera-West New Guinea, and Oceanic. The major focus, though, lies on Western Malayo-Polynesian languages. Some chapters investigate two of the largest languages in the region (Tagalog and different varieties of Malay), others study information-structural phenomena in small, underdescribed languages. The three overarching topics that are covered in this book are NP marking and reference tracking devices, syntactic structures and information-structural categories, and the interaction of information structure and prosody. Various data types build the basis for the different studies compiled in this book. Some chapters investigate written texts, such as modern novels (cf. Djenar’s chapter on modern, standard Indonesian), or compare different text genres, such as, for example, oral narratives and translations of biblical narratives (cf. De Busser’s chapter on Bunun). Most contributions, however, study natural spoken speech and make use of spoken corpora which have been compiled by the authors themselves. The volume comprises a number of different methods and theoretical frameworks. Two chapters make use of the Question Under Discussion approach, developed in formal semantics (cf. the chapters by Latrouite & Riester; Shiohara & Riester). Riesberg et al. apply the recently developed method of Rapid Prosody Transcription (RPT) to investigate native speakers’ perception of prosodic prominences and boundaries in Papuan Malay. Other papers discuss theoretical consequences of their findings. Thus, for example, Himmelmann takes apart the most widespread framework for intonational phonology (ToBI) and argues that the analysis of Indonesian languages requires much simpler assumptions than the ones underlying the standard model. Arka & Sedeng ask the question how fine-grained information structure space should be conceptualized and modelled, e.g. in LFG. Schnell argues that elements that could be analysed as “topic” and “focus” categories, should better be described in terms of ‘packaging’ and do not necessarily reflect any pragmatic roles in the first place

    Diachrony of differential argument marking

    Get PDF
    While there are languages that code a particular grammatical role (e.g. subject or direct object) in one and the same way across the board, many more languages code the same grammatical roles differentially. The variables which condition the differential argument marking (or DAM) pertain to various properties of the NP (such as animacy or definiteness) or to event semantics or various properties of the clause. While the main line of current research on DAM is mainly synchronic the volume tackles the diachronic perspective. The tenet is that the emergence and the development of differential marking systems provide a different kind of evidence for the understanding of the phenomenon. The present volume consists of 18 chapters and primarily brings together diachronic case studies on particular languages or language groups including e.g. Finno-Ugric, Sino-Tibetan and Japonic languages. The volume also includes a position paper, which provides an overview of the typology of different subtypes of DAM systems, a chapter on computer simulation of the emergence of DAM and a chapter devoted to the cross-linguistic effects of referential hierarchies on DAM

    Empirical machine translation and its evaluation

    Get PDF
    Aquesta tesi estudia l'aplicació de les tecnologies del Processament del Llenguatge Natural disponibles actualment al problema de la Traducció Automàtica basada en Mètodes Empírics i la seva Avaluació.D'una banda, tractem el problema de l'avaluació automàtica. Hem analitzat les principals deficiències dels mètodes d'avaluació actuals, les quals es deuen, al nostre parer, als principis de qualitat superficials en els que es basen. En comptes de limitar-nos al nivell lèxic, proposem una nova direcció cap a avaluacions més heterogènies. El nostre enfocament es basa en el disseny d'un ric conjunt de mesures automàtiques destinades a capturar un ampli ventall d'aspectes de qualitat a diferents nivells lingüístics (lèxic, sintàctic i semàntic). Aquestes mesures lingüístiques han estat avaluades sobre diferents escenaris. El resultat més notable ha estat la constatació de que les mètriques basades en un coneixement lingüístic més profund (sintàctic i semàntic) produeixen avaluacions a nivell de sistema més fiables que les mètriques que es limiten a la dimensió lèxica, especialment quan els sistemes avaluats pertanyen a paradigmes de traducció diferents. Tanmateix, a nivell de frase, el comportament d'algunes d'aquestes mètriques lingüístiques empitjora lleugerament en comparació al comportament de les mètriques lèxiques. Aquest fet és principalment atribuïble als errors comesos pels processadors lingüístics. A fi i efecte de millorar l'avaluació a nivell de frase, a més de recòrrer a la similitud lèxica en absència d'anàlisi lingüística, hem estudiat la possibiliat de combinar les puntuacions atorgades per mètriques a diferents nivells lingüístics en una sola mesura de qualitat. S'han presentat dues estratègies no paramètriques de combinació de mètriques, essent el seu principal avantatge no haver d'ajustar la contribució relativa de cadascuna de les mètriques a la puntuació global. A més, el nostre treball mostra com fer servir el conjunt de mètriques heterogènies per tal d'obtenir detallats informes d'anàlisi d'errors automàticament.D'altra banda, hem estudiat el problema de la selecció lèxica en Traducció Automàtica Estadística. Amb aquesta finalitat, hem construit un sistema de Traducció Automàtica Estadística Castellà-Anglès basat en -phrases', i hem iterat en el seu cicle de desenvolupament, analitzant diferents maneres de millorar la seva qualitat mitjançant la incorporació de coneixement lingüístic. En primer lloc, hem extès el sistema a partir de la combinació de models de traducció basats en anàlisi sintàctica superficial, obtenint una millora significativa. En segon lloc, hem aplicat models de traducció discriminatius basats en tècniques d'Aprenentatge Automàtic. Aquests models permeten una millor representació del contexte de traducció en el que les -phrases' ocorren, efectivament conduint a una millor selecció lèxica. No obstant, a partir d'avaluacions automàtiques heterogènies i avaluacions manuals, hem observat que les millores en selecció lèxica no comporten necessàriament una millor estructura sintàctica o semàntica. Així doncs, la incorporació d'aquest tipus de prediccions en el marc estadístic requereix, per tant, un estudi més profund.Com a qüestió complementària, hem estudiat una de les principals crítiques en contra dels sistemes de traducció basats en mètodes empírics, la seva forta dependència del domini, i com els seus efectes negatius poden ésser mitigats combinant adequadament fonts de coneixement externes. En aquest sentit, hem adaptat amb èxit un sistema de traducció estadística Anglès-Castellà entrenat en el domini polític, al domini de definicions de diccionari.Les dues parts d'aquesta tesi estan íntimament relacionades, donat que el desenvolupament d'un sistema real de Traducció Automàtica ens ha permès viure en primer terme l'important paper dels mètodes d'avaluació en el cicle de desenvolupament dels sistemes de Traducció Automàtica.In this thesis we have exploited current Natural Language Processing technology for Empirical Machine Translation and its Evaluation.On the one side, we have studied the problem of automatic MT evaluation. We have analyzed the main deficiencies of current evaluation methods, which arise, in our opinion, from the shallow quality principles upon which they are based. Instead of relying on the lexical dimension alone, we suggest a novel path towards heterogeneous evaluations. Our approach is based on the design of a rich set of automatic metrics devoted to capture a wide variety of translation quality aspects at different linguistic levels (lexical, syntactic and semantic). Linguistic metrics have been evaluated over different scenarios. The most notable finding is that metrics based on deeper linguistic information (syntactic/semantic) are able to produce more reliable system rankings than metrics which limit their scope to the lexical dimension, specially when the systems under evaluation are different in nature. However, at the sentence level, some of these metrics suffer a significant decrease, which is mainly attributable to parsing errors. In order to improve sentence-level evaluation, apart from backing off to lexical similarity in the absence of parsing, we have also studied the possibility of combining the scores conferred by metrics at different linguistic levels into a single measure of quality. Two valid non-parametric strategies for metric combination have been presented. These offer the important advantage of not having to adjust the relative contribution of each metric to the overall score. As a complementary issue, we show how to use the heterogeneous set of metrics to obtain automatic and detailed linguistic error analysis reports.On the other side, we have studied the problem of lexical selection in Statistical Machine Translation. For that purpose, we have constructed a Spanish-to-English baseline phrase-based Statistical Machine Translation system and iterated across its development cycle, analyzing how to ameliorate its performance through the incorporation of linguistic knowledge. First, we have extended the system by combining shallow-syntactic translation models based on linguistic data views. A significant improvement is reported. This system is further enhanced using dedicated discriminative phrase translation models. These models allow for a better representation of the translation context in which phrases occur, effectively yielding an improved lexical choice. However, based on the proposed heterogeneous evaluation methods and manual evaluations conducted, we have found that improvements in lexical selection do not necessarily imply an improved overall syntactic or semantic structure. The incorporation of dedicated predictions into the statistical framework requires, therefore, further study.As a side question, we have studied one of the main criticisms against empirical MT systems, i.e., their strong domain dependence, and how its negative effects may be mitigated by properly combining outer knowledge sources when porting a system into a new domain. We have successfully ported an English-to-Spanish phrase-based Statistical Machine Translation system trained on the political domain to the domain of dictionary definitions.The two parts of this thesis are tightly connected, since the hands-on development of an actual MT system has allowed us to experience in first person the role of the evaluation methodology in the development cycle of MT systems
    corecore