426,048 research outputs found

    Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging

    Get PDF
    We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages

    A Semantics-Based Approach to Design of Query Languages for Partial Information

    Get PDF
    Most of work on partial information in databases asks which operations of standard languages, like relational algebra, can still be performed correctly in the presence of nulls. In this paper a different point of view is advocated. We believe that the semantics of partiality must be clearly understood and it should give us new design principles for languages for databases with partial information. There are different sources of partial information, such as missing information and conflicts that occur when different databases are merged. In this paper, we develop a common semantic framework for them which can be applied in a context more general than the flat relational model. This ordered semantics, which is based on ideas used in the semantics of programming languages, cleanly intergrates all kinds of partial information and serves as a tool to establish connections between them. Analyzing properties of semantic domains of types suitable for representing partial information, we come up with operations that are naturally associated with those types, and we organize programming syntax around these operations. We show how the languages that we obtain can be used to ask typical queries about incomplete information in relational databases, and how they can express some previously proposed languages. Finally, we discuss a few related topics such as mixing traditional constraints with partial information and extending semantics and languages to accommodate bags and recursive types

    Boundedness in languages of infinite words

    Full text link
    We define a new class of languages of ω\omega-words, strictly extending ω\omega-regular languages. One way to present this new class is by a type of regular expressions. The new expressions are an extension of ω\omega-regular expressions where two new variants of the Kleene star LL^* are added: LBL^B and LSL^S. These new exponents are used to say that parts of the input word have bounded size, and that parts of the input can have arbitrarily large sizes, respectively. For instance, the expression (aBb)ω(a^Bb)^\omega represents the language of infinite words over the letters a,ba,b where there is a common bound on the number of consecutive letters aa. The expression (aSb)ω(a^Sb)^\omega represents a similar language, but this time the distance between consecutive bb's is required to tend toward the infinite. We develop a theory for these languages, with a focus on decidability and closure. We define an equivalent automaton model, extending B\"uchi automata. The main technical result is a complementation lemma that works for languages where only one type of exponent---either LBL^B or LSL^S---is used. We use the closure and decidability results to obtain partial decidability results for the logic MSOLB, a logic obtained by extending monadic second-order logic with new quantifiers that speak about the size of sets
    corecore