426,048 research outputs found
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages
A Semantics-Based Approach to Design of Query Languages for Partial Information
Most of work on partial information in databases asks which operations of standard languages, like relational algebra, can still be performed correctly in the presence of nulls. In this paper a different point of view is advocated. We believe that the semantics of partiality must be clearly understood and it should give us new design principles for languages for databases with partial information. There are different sources of partial information, such as missing information and conflicts that occur when different databases are merged. In this paper, we develop a common semantic framework for them which can be applied in a context more general than the flat relational model. This ordered semantics, which is based on ideas used in the semantics of programming languages, cleanly intergrates all kinds of partial information and serves as a tool to establish connections between them. Analyzing properties of semantic domains of types suitable for representing partial information, we come up with operations that are naturally associated with those types, and we organize programming syntax around these operations. We show how the languages that we obtain can be used to ask typical queries about incomplete information in relational databases, and how they can express some previously proposed languages. Finally, we discuss a few related topics such as mixing traditional constraints with partial information and extending semantics and languages to accommodate bags and recursive types
Boundedness in languages of infinite words
We define a new class of languages of -words, strictly extending
-regular languages.
One way to present this new class is by a type of regular expressions. The
new expressions are an extension of -regular expressions where two new
variants of the Kleene star are added: and . These new
exponents are used to say that parts of the input word have bounded size, and
that parts of the input can have arbitrarily large sizes, respectively. For
instance, the expression represents the language of infinite
words over the letters where there is a common bound on the number of
consecutive letters . The expression represents a similar
language, but this time the distance between consecutive 's is required to
tend toward the infinite.
We develop a theory for these languages, with a focus on decidability and
closure. We define an equivalent automaton model, extending B\"uchi automata.
The main technical result is a complementation lemma that works for languages
where only one type of exponent---either or ---is used.
We use the closure and decidability results to obtain partial decidability
results for the logic MSOLB, a logic obtained by extending monadic second-order
logic with new quantifiers that speak about the size of sets
- …