49,256 research outputs found
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Serbo-Croat Clitics and Word Grammar
Serbo-Croat has a complex system of clitics which raise interesting problems for any theory of the interface between syntax and morphology. After summarising the data we review previous analyses (mostly within the generative tradition), all of which are unsatisfactory in various ways. We then explain how Word Grammar handles clitics: as words whose form is an affix rather than the usual ‘word-form’. Like other affixes, clitics need a word to accommodate them, but in the case of clitics this is a special kind of word called a ‘hostword’. We present a detailed analysis of Serbo-Croat clitics within this theory, introducing a new distinction between two cases: where the clitics are attached to the verb or auxiliary, and where they are attached to some dependent of the verb
P-model Alternative to the T-model
Standard linguistic analysis of syntax uses the T-model. This model
requires the ordering: D-structure S-structure LF,
where D-structure is the deep structure,
S-structure is the surface structure, and LF is logical form.
Between each of these representations there is movement which alters
the order of the constituent words; movement is achieved using the principles
and parameters of syntactic theory. Psychological analysis of sentence
production is usually either serial or connectionist. Psychological serial
models do not accommodate the T-model immediately so that here a new model
called the P-model is introduced. The P-model is different from previous
linguistic and psychological models. Here it is argued that the LF
representation should be replaced by a variant
of Frege's three qualities (sense, reference, and force),
called the Frege representation or F-representation.
In the F-representation the order of elements is not necessarily the same as
that in LF and it is suggested that the correct ordering is:
F-representation D-structure S-structure.
This ordering appears to lead to a more natural
view of sentence production and processing. Within this framework movement
originates as the outcome of emphasis applied to the sentence. The
requirement that the F-representation precedes the D-structure needs a picture
of the particular principles and parameters which pertain to movement of words
between representations. In general this would imply that there is a
preferred or optimal ordering of the symbolic string in the F-representation.
The standard ordering is retained because the general way of producing
such an optimal ordering is unclear. In this case it is possible to produce
an analysis of movement between LF and D-structure similar to the usual
analysis of movement between S-structure and LF.
It is suggested that a maximal amount of information about
a language's grammar and lexicon is stored,
because of the necessity of analyzing corrupted data
Adjectival modification and multiple determiners
The present paper deals with the distribution of the definite determiner and certain related aspects of adjectival modification in Greek DPs. As (1) shows, determiners in Greek DPs precede adjectives and adjectives precede nouns. All three categories overtly agree in gender, number and case
Harmony, Head Proximity, and the Near Parallels between Nominal and Clausal Linkers
This paper puts forward a notion of harmonic word order that leads to a new generalisation over the presence or absence of disharmony: specific functional heads must cross-linguistically obey this notion of harmony absolutely, while for other categories the presence of harmony is simply a tendency. The difference between the two classes is defined by semantics. This approach allows us both to draw certain parallels between restrictions on word order in nominals and in clauses, and furthermore to explain why other expected parallels should fail to be realised completely, specifically as regards differences in the distribution of relative clauses in the NP and complement clauses in the sentence. Syntactically independent relative clause markers and subordinating complementisers share a striking restriction as regards ordering: relative clause markers are always initial in postnominal relative clauses, and final in prenominal relative clauses (Andrews 1975; Downing 1978; Lehmann 1984; Keenan 1985; De Vries 2002, 2005); similarly, initial subordinating Cs only appear in postverbal complement clauses, while final subordinating Cs are only possible where the complement clause is preverbal (Bayer 1996, 1997, 1999; Kayne 2000). In this paper, I provide new evidence from eighty genetically and geographically diverse languages of a third category sharing precisely the same restriction: linkers in the complex NP. These are syntactically independent, semantically vacuous heads, serving to mark the presence of a relationship between a noun and any kind of phrasal dependent (Rubin 2002; Den Dikken and Singhapreecha 2004; Philip 2009). The class of linkers in the NP therefore includes the ezafe in Indo-Iranian, the associative marker -a in Bantu, as well as purely functional adpositions such as of in English. Like relative clause markers and subordinating Cs, the linker always intervenes linearly between the superordinate head (the noun) and the subordinate dependent. Crucially, relative clause markers, subordinating Cs, and linkers in the NP form a natural class: they are syntactically independent, semantically vacuous words serving purely to mark the presence of a relationship between head and dependent. Any member of this class is a ‘linker’. I propose a theory of disharmony whereby linearisation rules targeting heads with specified semantics can require such heads to appear in a prominent position, either initial or final, irrespective of the general headedness of the language. Linkers, being semantically vacuous, are of course impervious to such rules; they will therefore always conform to the harmonic, or optimal, word order. I propose a theory of harmony whereby the optimal word order is determined by the interaction of three independently motivated harmonic word order constraints: Head Proximity (adapted from Rijkhoff 1984, 1986, cf. Head-Final Filter, Williams 1982), the preference for uniformity in headedness (initial or final), and the preference for clausal dependents to appear in final position (Dryer 1980, 1992). Where the three constraints compete, it is always Head Proximity that takes precedence. I show that the distribution of all three types of linker is fully captured by this proposal. Moreover, this theory of ordering also accounts for another well observed near parallel between clauses and nominals, as well as its exceptions. This concerns a left-right asymmetry in the distribution of clausal dependents: while in OV languages complement clauses appear with near equal frequency in both preverbal and postverbal position, in VO languages they are found uniquely in postverbal position (Dryer 1980; Hawkins 1994; Dryer 2009); similarly, in OV languages relative clauses are distributed relatively evenly between prenominal and postnominal position, whereas in VO languages they are almost always postnominal, with very few exceptions (Mallinson & Blake 1981; Hawkins 1983, 1990; Lehmann 1984; Keenan 1985; Dryer 1992, 2007, 2008; De Vries 2005). The theory predicts these exceptions to be permitted only in languages that are rigidly N-final. Hawkins’ (1983) Noun Modifier Hierarchy suggests that this prediction is borne out; apparent exceptions (cf. Dryer 2008) are found underlyingly to be N-final
Robust Processing of Natural Language
Previous approaches to robustness in natural language processing usually
treat deviant input by relaxing grammatical constraints whenever a successful
analysis cannot be provided by ``normal'' means. This schema implies, that
error detection always comes prior to error handling, a behaviour which hardly
can compete with its human model, where many erroneous situations are treated
without even noticing them.
The paper analyses the necessary preconditions for achieving a higher degree
of robustness in natural language processing and suggests a quite different
approach based on a procedure for structural disambiguation. It not only offers
the possibility to cope with robustness issues in a more natural way but
eventually might be suited to accommodate quite different aspects of robust
behaviour within a single framework.Comment: 16 pages, LaTeX, uses pstricks.sty, pstricks.tex, pstricks.pro,
pst-node.sty, pst-node.tex, pst-node.pro. To appear in: Proc. KI-95, 19th
German Conference on Artificial Intelligence, Bielefeld (Germany), Lecture
Notes in Computer Science, Springer 199
- …