Search CORE

426,048 research outputs found

Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging

Author: Das Dipanjan
McDonald Ryan
Nivre Joakim
Petrov Slav
Täckström Oscar
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2013
Field of study

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random ﬁeld model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

A Semantics-Based Approach to Design of Query Languages for Partial Information

Author: Libkin Leonid
Publication venue: ScholarlyCommons
Publication date: 01/01/1994
Field of study

Most of work on partial information in databases asks which operations of standard languages, like relational algebra, can still be performed correctly in the presence of nulls. In this paper a different point of view is advocated. We believe that the semantics of partiality must be clearly understood and it should give us new design principles for languages for databases with partial information. There are different sources of partial information, such as missing information and conflicts that occur when different databases are merged. In this paper, we develop a common semantic framework for them which can be applied in a context more general than the flat relational model. This ordered semantics, which is based on ideas used in the semantics of programming languages, cleanly intergrates all kinds of partial information and serves as a tool to establish connections between them. Analyzing properties of semantic domains of types suitable for representing partial information, we come up with operations that are naturally associated with those types, and we organize programming syntax around these operations. We show how the languages that we obtain can be used to ask typical queries about incomplete information in relational databases, and how they can express some previously proposed languages. Finally, we discuss a few related topics such as mixing traditional constraints with partial information and extending semantics and languages to accommodate bags and recursive types

CiteSeerX

ScholarlyCommons@Penn

Boundedness in languages of infinite words

Author: Bojańczyk Mikołaj
Colcombet Thomas
Publication venue
Publication date: 25/10/2017
Field of study

We define a new class of languages of

\omega

-words, strictly extending

\omega

-regular languages. One way to present this new class is by a type of regular expressions. The new expressions are an extension of

\omega

-regular expressions where two new variants of the Kleene star

L^*

are added:

L^B

and

L^S

. These new exponents are used to say that parts of the input word have bounded size, and that parts of the input can have arbitrarily large sizes, respectively. For instance, the expression

(a^Bb)^\omega

represents the language of infinite words over the letters

a,b

where there is a common bound on the number of consecutive letters

a

. The expression

(a^Sb)^\omega

represents a similar language, but this time the distance between consecutive

b

's is required to tend toward the infinite. We develop a theory for these languages, with a focus on decidability and closure. We define an equivalent automaton model, extending B\"uchi automata. The main technical result is a complementation lemma that works for languages where only one type of exponent---either

L^B

L^S

---is used. We use the closure and decidability results to obtain partial decidability results for the logic MSOLB, a logic obtained by extending monadic second-order logic with new quantifiers that speak about the size of sets

arXiv.org e-Print Archive

Episciences.org