Search CORE

13 research outputs found

Review of Morphosyntactic Categories and the Expression of Possession

Author: Sulger Sebastian
Publication venue: CSLI Publications
Publication date: 01/02/2016
Field of study

This paper is a review of the book Morphosyntactic Categories and the Expression of Possession, a collection of 11 papers all dealing with the cross-linguistic realization of the concept of possession through various morphosyntactic constructions. After a short introduction, the review summarizes the contents of the volume and provides an evaluation

Journal of South Asian Linguistics

Modeling Nominal Predications in Hindi/Urdu

Author: Sulger Sebastian
Publication venue
Publication date: 01/01/2015
Field of study

The identification and classification of nominal predicators and their arguments is a notorious problem in natural language processing (NLP). Semantic reasoning, information retrieval, question answering and other applications can benefit greatly from a successful treatment of nominal predication (Meyers et al., 2004b). Overall, a considerable amount of work in NLP focuses on the identification and annotation of verbal predication and arguments thereof, and there is less research on types and identification of nominal predicates and their arguments. This thesis is a contribution of the latter type. It focuses on the description and analysis of nominal arguments in the South Asian language Hindi/Urdu. The analysis is couched within the theory of Lexical-Functional Grammar (LFG, Bresnan, 2001, Dalrymple, 2001) and is implemented in a computational grammar, the Urdu ParGram grammar (Butt and King, 2007), which is a part of the ParGram (“Parallel Grammar”) project on parallel LFG grammar engineering (Butt et al., 1999a, 2002, 1999b). The implementation makes use of the grammar development platform XLE (Crouch et al., 2015).Different types of case-marked nominal arguments in Hindi/Urdu are examined: genitive, locative as well as instrumental arguments. Among these, the genitive is a special case marker in that it features morphological agreement with the head noun (Butt and King, 2004b). Each of these argument types is discussed in detail regarding the case marking strategies employed, their general linear order within the noun phrase, the selection by distinct types of head nominals, their overall functional behavior as well as binding properties.All types of nominal arguments exhibit scrambling and can occur outside of the noun phrases they are licensed in; this is attributed to the tight correlation between the case marking and the thematic role realized by the argument as well as, in the case of the genitive, the morphosyntactic agreement between the case marker and the head noun. In addition, all types of nominal arguments may regularly undergo a process of argument suppression, which results in the argument not being realized, but existentially bound (Barker, 1995); an analysis of these cases in terms of pronominal drop is considered, but rejected. The thesis also reviews possessive and locative clauses, which are analyzed as intransitives and copula clauses, respectively. These novel analyses are shown to better account for the functional behavior, binding patterns, as well as overall structural paradigms of these clauses compared to previous analyses.The thesis also includes a discussion of noun-verb complex predicates in Hindi/Urdu. Here, nouns (referred to as “nominal hosts”) and verbs (also called “light verbs”) form a single predicate with a single set of grammatical functions, whereas the argument structure is complex, as the nominal host may itself contribute arguments to the overall predication (Mohanan, 1994). While the construction in and of itself is theoretically comparatively well-understood, the combinatory possibilities between the nominal host and the light verb are not; some hosts occur with several light verbs, others occur with subsets, while still others occur with a single light verb only. Two corpus studies are discussed that aim at 1) uncovering the constraints on combining hosts with light verbs and 2) creating a lexical resource that can serve as input to NLP applications. One such application is the Urdu ParGram grammar, where it is shown how the results from the corpus studies can be translated into templates that model the combinatory patterns in terms of statistical (dis)preferences.From the point of view of grammar development, the thesis argues that a unified account of genitive arguments, as currently employed in ParGram, cannot be maintained. Instead, the thesis proposes to use a more detailed approach that can successfully account for the observed patterns. This connects to the issue of parallelism in ParGram. Conventions developed within the ParGram grammars are extensive and dictate the form and possible values of the features used in the grammars as well as the type of analysis chosen for a particular construction. Grammar writers are in principle only allowed to abandon parallelism if maintaining it would be at the cost of misrepresenting the linguistic facts (Butt and King, 2007, Butt et al., 1999b, King et al., 2005). Being faithful to the facts of nominal predication in Hindi/Urdu entails abandoning the ParGram analysis for possessives, which does not distinguish between different types of possessives (Dipper, 2003).On the other hand, the implementation profits from the detailed linguistic analysis that is applied within ParGram and uses an array of notational instruments in XLE that model the generalizations in an accurate manner. The complete grammar (in its most recent version, as of the time of submission of the thesis) is included on the CD-ROM attached to this document; the implementation can also be tested using the online INESS platform for treebanking and LFG grammar testing (Rosén et al., 2012a,b) (the INESS homepage is located at: http://iness.uib.no).publishe

KOPS - The Institutional Repository of the University of Konstanz

Analytic and Synthetic Verb Forms in Irish - An Agreement-Based Implementation in LFG

Author: Sebastian Sulger
Publication venue
Publication date: 03/04/2020
Field of study

Abstract This paper discusses the phenomenon of analytic and synthetic verb forms in Modern Irish, which results in a widespread system of morphological blocking. I present data from Modern Irish, then briefly discuss two earlier theoretical approaches. I introduce an alternative, agreement-based solution, involving 1) a finite-state morphological analyzer for verb forms implemented using the FST toolse

CiteSeerX

Extracting and Classifying Urdu Multiword Expressions

Author: Annette Hautli
Sebastian Sulger
Publication venue
Publication date: 01/01/2011
Field of study

This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account. The resulting classes are evaluated against a hand-annotated gold standard and achieve an f-score of 0.5 and 0.746 for locations and persons, respectively. A target application is the Urdu ParGram grammar, where MWEs are needed to generate a more precise syntactic and semantic analysis.

KOPS - The Institutional Repository of the University of Konstanz

CiteSeerX

A Reference Dependency Bank for Analyzing Complex Predicates

Author: Ahmed Tafseer
Butt Miriam
Hautli Annette
Sulger Sebastian
Publication venue
Publication date: 01/01/2012
Field of study

When dealing with languages of South Asia from an NLP perspective, a problem that repeatedly crops up is the treatment of complex predicates. This paper presents a first approach to the analysis of complex predicates (CPs) in the context of dependency bank development. The effort originates in theoretical work on CPs done within Lexical-Functional Grammar (LFG), but is intended to provide a guideline for analyzing different types of CPs in an independent framework. Despite the fact that we focus on CPs in Hindi and Urdu, the design of the dependencies is kept general enough to account for CP constructions across languages

KOPS - The Institutional Repository of the University of Konstanz

CiteSeerX

Automatic detection of causal relations in German multilogs

Author: Butt Miriam
Bögel Tina
Hautli-Janisz Annette
Sulger Sebastian
Publication venue
Publication date: 01/01/2014
Field of study

This paper introduces a linguisticallymotivated, rule-based annotation system for causal discourse relations in transcripts of spoken multilogs in German. The overall aim is an automatic means of determining the degree of justification provided by a speaker in the delivery of an argumentin a multiparty discussion. The system comprises of two parts: A disambiguation module which differentiates causal connectors from their other senses, and a discourse relation annotation system which marks the spans of text that constitute the reason and the result/conclusion expressedby the causal relation. The system is evaluated against a gold standard of German transcribed spoken dialogue. The results show that our system performs reliably well with respect to both tasks

KOPS - The Institutional Repository of the University of Konstanz

Identifying Urdu Complex Predication via Bigram Extraction M iriam But t 1 T ina Bögel 1

Author: Ahmed Tafseer
Butt Miriam
Bögel Tina
Hautli Annette
Sulger Sebastian
Publication venue
Publication date: 01/01/2012
Field of study

A problem that crops up repeatedly in shallow and deep syntactic parsing approaches to South Asian languages like Urdu/Hindi is the proper treatment of complex predications. Problems for the NLP of complex predications are posed by their productiveness and the ill understood nature of the range of their combinatorial possibilities. This paper presents an investigation into whether fine-grained information about the distributional properties of nouns in N+V CPs can be identified by the comparatively simple process of extracting bigrams from a large “raw” corpus of Urdu. In gathering the relevant properties, we were aided by visual analytics in that we coupled our computational data analysis with interactive visual components in the analysis of the large data sets. The visualization component proved to be an essential part of our data analysis, particular for the easy visual identification of outliers and false positives. Another essential component turned out to be our language-particular knowledge and access to existing language-particular resources. Overall, we were indeed able to identify high frequency N-V complex predications as well as pick out combinations we had not been aware of before. However, a manual inspection of our results also pointed to a problem of data sparsity, despite the use of a large corpus

CiteSeerX

Urdu/Hindi modals

Author: Butt Miriam
Bögel Tina
Hautli Annette
Rajesh Bhatt
Sulger Sebastian
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we survey the various ways of expressing modality in Urdu/Hindi and show that Urdu/Hindi modals provide interesting insights on current discussions of the semantics of modality. There are very few dedicated modals in Urdu/Hindi: most of which has been arrived at constructionally via a combination of a certain kind of verb with a certain kind of embedded verb form and a certain kind of case. Among the range of constructions yielded by such combinations, there is evidence for a two-place modal operator in addition to the one-place operator usually assumed in the literature. We also discuss instances of the Actuality Entailment, which had been shown to be sensitive to aspect, but in Urdu/Hindi appears to be sensitive to aspect only some of the time, depending on the type of modal verb. Indeed, following recent proposals by Ramchand (2011), we end up with a purely lexical account of modality and the Actuality Entailment, rather than the structural one put forward by Hacquard (2010)

KOPS - The Institutional Repository of the University of Konstanz

Identifying Urdu Complex Predication via Bigram Extraction

Author: Ahmed Tafseer
Butt Miriam
Bögel Tina
Hautli Annette
Sulger Sebastian
Publication venue
Publication date: 01/01/2012
Field of study

KOPS - The Institutional Repository of the University of Konstanz