29 research outputs found

    Compiling Linguistic Constraints into Finite State Automata

    Get PDF
    International audienceThis paper deals with linguistic constraints encoded in the form of (binary) tables, generally called lexicon-grammar tables. We describe a unified method to compile sets of tables of linguistic constraints into Finite State Automata. This method has been practically implemented in the linguistic platform Unitex

    Detecting Latin-Based Medical Terminology in Croatian Texts

    Get PDF
    No matter what the main language of texts in the medical domain is, there is always an evidence of the usage of Latin-derived words and formative elements in terminology development. Generally speaking, this usage presents language-specific morpho-semantic behaviors in forming both technical-scientific and common-usage words. Nevertheless, this usage of Latin in Croatian medical texts does not seem consistent due to the fact that diferent mechanisms of word formation may be applied to the same term. In our pursuit to map all the diferent occurrences of the same concept to only one, we propose a model designed within NooJ and based on dictionaries and morphological grammars. Starting from the manual detection of nouns and their variations, we recognize some word formation mechanisms and develop grammars suitable to recognize Latinisms and Croatinized Latin medical terminology

    On Heads and Coordination in Valence Acquisition

    Full text link
    Abstract. The aim of this paper is to present the design of a partial syntactic annotation of the IPI PAN Corpus of Polish [22] and the cor-responding extension of the corpus search engine Poliqarp [25,12] devel-oped at the Institue of Computer Science PAS and currently employed in Polish and Portuguese corpora projects. In particular, we will argue for the need to distinguish between, and represent both, syntactic and se-mantic heads, and we will sketch the representation of coordination, the area traditionally controversial both in theoretical and in computational linguistics. The annotation is designed in a way intended to maximise the usefulness of the resulting corpus for the task of automatic valence acquisition

    Searching Text Corpora with grep

    No full text

    INTEX 4.1 for Windows: A Walkthrough

    No full text

    Local Grammars and Parsing Coordination of Nouns in Serbo-Croatian

    No full text
    corecore