49,256 research outputs found

    A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

    Get PDF
    Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

    Serbo-Croat Clitics and Word Grammar

    Get PDF
    Serbo-Croat has a complex system of clitics which raise interesting problems for any theory of the interface between syntax and morphology. After summarising the data we review previous analyses (mostly within the generative tradition), all of which are unsatisfactory in various ways. We then explain how Word Grammar handles clitics: as words whose form is an affix rather than the usual ‘word-form’. Like other affixes, clitics need a word to accommodate them, but in the case of clitics this is a special kind of word called a ‘hostword’. We present a detailed analysis of Serbo-Croat clitics within this theory, introducing a new distinction between two cases: where the clitics are attached to the verb or auxiliary, and where they are attached to some dependent of the verb

    P-model Alternative to the T-model

    Get PDF
    Standard linguistic analysis of syntax uses the T-model. This model requires the ordering: D-structure >> S-structure >> LF, where D-structure is the deep structure, S-structure is the surface structure, and LF is logical form. Between each of these representations there is movement which alters the order of the constituent words; movement is achieved using the principles and parameters of syntactic theory. Psychological analysis of sentence production is usually either serial or connectionist. Psychological serial models do not accommodate the T-model immediately so that here a new model called the P-model is introduced. The P-model is different from previous linguistic and psychological models. Here it is argued that the LF representation should be replaced by a variant of Frege's three qualities (sense, reference, and force), called the Frege representation or F-representation. In the F-representation the order of elements is not necessarily the same as that in LF and it is suggested that the correct ordering is: F-representation >> D-structure >> S-structure. This ordering appears to lead to a more natural view of sentence production and processing. Within this framework movement originates as the outcome of emphasis applied to the sentence. The requirement that the F-representation precedes the D-structure needs a picture of the particular principles and parameters which pertain to movement of words between representations. In general this would imply that there is a preferred or optimal ordering of the symbolic string in the F-representation. The standard ordering is retained because the general way of producing such an optimal ordering is unclear. In this case it is possible to produce an analysis of movement between LF and D-structure similar to the usual analysis of movement between S-structure and LF. It is suggested that a maximal amount of information about a language's grammar and lexicon is stored, because of the necessity of analyzing corrupted data

    Adjectival modification and multiple determiners

    Get PDF
    The present paper deals with the distribution of the definite determiner and certain related aspects of adjectival modification in Greek DPs. As (1) shows, determiners in Greek DPs precede adjectives and adjectives precede nouns. All three categories overtly agree in gender, number and case

    Harmony, Head Proximity, and the Near Parallels between Nominal and Clausal Linkers

    Get PDF
    This paper puts forward a notion of harmonic word order that leads to a new generalisation over the presence or absence of disharmony: specific functional heads must cross-linguistically obey this notion of harmony absolutely, while for other categories the presence of harmony is simply a tendency. The difference between the two classes is defined by semantics. This approach allows us both to draw certain parallels between restrictions on word order in nominals and in clauses, and furthermore to explain why other expected parallels should fail to be realised completely, specifically as regards differences in the distribution of relative clauses in the NP and complement clauses in the sentence. Syntactically independent relative clause markers and subordinating complementisers share a striking restriction as regards ordering: relative clause markers are always initial in postnominal relative clauses, and final in prenominal relative clauses (Andrews 1975; Downing 1978; Lehmann 1984; Keenan 1985; De Vries 2002, 2005); similarly, initial subordinating Cs only appear in postverbal complement clauses, while final subordinating Cs are only possible where the complement clause is preverbal (Bayer 1996, 1997, 1999; Kayne 2000). In this paper, I provide new evidence from eighty genetically and geographically diverse languages of a third category sharing precisely the same restriction: linkers in the complex NP. These are syntactically independent, semantically vacuous heads, serving to mark the presence of a relationship between a noun and any kind of phrasal dependent (Rubin 2002; Den Dikken and Singhapreecha 2004; Philip 2009). The class of linkers in the NP therefore includes the ezafe in Indo-Iranian, the associative marker -a in Bantu, as well as purely functional adpositions such as of in English. Like relative clause markers and subordinating Cs, the linker always intervenes linearly between the superordinate head (the noun) and the subordinate dependent. Crucially, relative clause markers, subordinating Cs, and linkers in the NP form a natural class: they are syntactically independent, semantically vacuous words serving purely to mark the presence of a relationship between head and dependent. Any member of this class is a ‘linker’. I propose a theory of disharmony whereby linearisation rules targeting heads with specified semantics can require such heads to appear in a prominent position, either initial or final, irrespective of the general headedness of the language. Linkers, being semantically vacuous, are of course impervious to such rules; they will therefore always conform to the harmonic, or optimal, word order. I propose a theory of harmony whereby the optimal word order is determined by the interaction of three independently motivated harmonic word order constraints: Head Proximity (adapted from Rijkhoff 1984, 1986, cf. Head-Final Filter, Williams 1982), the preference for uniformity in headedness (initial or final), and the preference for clausal dependents to appear in final position (Dryer 1980, 1992). Where the three constraints compete, it is always Head Proximity that takes precedence. I show that the distribution of all three types of linker is fully captured by this proposal. Moreover, this theory of ordering also accounts for another well observed near parallel between clauses and nominals, as well as its exceptions. This concerns a left-right asymmetry in the distribution of clausal dependents: while in OV languages complement clauses appear with near equal frequency in both preverbal and postverbal position, in VO languages they are found uniquely in postverbal position (Dryer 1980; Hawkins 1994; Dryer 2009); similarly, in OV languages relative clauses are distributed relatively evenly between prenominal and postnominal position, whereas in VO languages they are almost always postnominal, with very few exceptions (Mallinson & Blake 1981; Hawkins 1983, 1990; Lehmann 1984; Keenan 1985; Dryer 1992, 2007, 2008; De Vries 2005). The theory predicts these exceptions to be permitted only in languages that are rigidly N-final. Hawkins’ (1983) Noun Modifier Hierarchy suggests that this prediction is borne out; apparent exceptions (cf. Dryer 2008) are found underlyingly to be N-final

    Robust Processing of Natural Language

    Full text link
    Previous approaches to robustness in natural language processing usually treat deviant input by relaxing grammatical constraints whenever a successful analysis cannot be provided by ``normal'' means. This schema implies, that error detection always comes prior to error handling, a behaviour which hardly can compete with its human model, where many erroneous situations are treated without even noticing them. The paper analyses the necessary preconditions for achieving a higher degree of robustness in natural language processing and suggests a quite different approach based on a procedure for structural disambiguation. It not only offers the possibility to cope with robustness issues in a more natural way but eventually might be suited to accommodate quite different aspects of robust behaviour within a single framework.Comment: 16 pages, LaTeX, uses pstricks.sty, pstricks.tex, pstricks.pro, pst-node.sty, pst-node.tex, pst-node.pro. To appear in: Proc. KI-95, 19th German Conference on Artificial Intelligence, Bielefeld (Germany), Lecture Notes in Computer Science, Springer 199

    Rise of the associate: an analysis of English existential constructions

    Get PDF
    corecore