4,192 research outputs found

    Statistical dependency parsing of Turkish

    Get PDF
    This paper presents results from the first statistical dependency parser for Turkish. Turkish is a free-constituent order language with complex agglutinative inflectional and derivational morphology and presents interesting challenges for statistical parsing, as in general, dependency relations are between “portions” of words called inflectional groups. We have explored statistical models that use different representational units for parsing. We have used the Turkish Dependency Treebank to train and test our parser but have limited this initial exploration to that subset of the treebank sentences with only left-to-right non-crossing dependency links. Our results indicate that the best accuracy in terms of the dependency relations between inflectional groups is obtained when we use inflectional groups as units in parsing, and when contexts around the dependent are employed

    Implicit dialogical premises, explanation as argument: a corpus-based reconstruction

    Get PDF
    This paper focuses on an explanation in a newspaper article: why new European Union citizens will come to the UK from Eastern Europe (e.g., because of available jobs). Using a corpus-based method of analysis, I show how regular target readers have been positioned to generate premises in dialogue with the explanation propositions, and thus into an understanding of the explanation as an argument, one which contains a biased conclusion not apparent in the text. Employing this method, and in particular ‘corpus comparative statistical keywords’, I show how two issues can be freshly looked at: implicit premise recovery; the argument/explanation distinction

    Derivation of Czech verbs and the category of aspect

    Get PDF
    The present paper deals with the changes of the category of grammatical aspect during derivation of verbs from other verbs in Czech. After summarizing the main issues of the long-standing debate over aspect in Czech, formation of aspectual pairs is presented as an integral part of derivation of Czech verbs. The category of aspect was used as an important feature in modelling verb-to-verb derivation in a language data resource capturing the derivational morphology of Czech. Verbs in the database are organized according to a simple set of criteria.Odvozování českých sloves a kategorie vidu  (shrnutí)Příspěvek se zabývá změnami v kategorii slovesného vidu, k nimž dochází během odvozování sloves od sloves v češtině. Po stručném shrnutí základních bodů aspektologických diskuzí nad videm českého slovesa je tvoření vidových protějšků prezentováno jako integrální součást derivace českých sloves. Ve shodě s tímto pohledem je kategorie vidu využita jako důležitý rys při modelování slovesné derivace v databázi zachycující derivační morfologii češtiny. V příspěvku představujeme sadu kritérií, na jejichž základě byla slovesa v databázi organizována.Odvozování českých sloves a kategorie vidu (shrnutí)Příspěvek se zabývá změnami v kategorii slovesného vidu, k nimž dochází během odvozování sloves od sloves v češtině. Po stručném shrnutí základních bodů aspektologických diskuzí nad videm českého slovesa je tvoření vidových protějšků prezentováno jako integrální součást derivace českých sloves. Ve shodě s tímto pohledem je kategorie vidu využita jako důležitý rys při modelování slovesné derivace v databázi zachycující derivační morfologii češtiny. V příspěvku představujeme sadu kritérií, na jejichž základě byla slovesa v databázi organizována

    Automatic Extraction of Subcategorization from Corpora

    Full text link
    We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9

    PROSTY CZY ARCHAICZY: NOWY CZESKI KODEKS CYWILNY PŁYNIE POD PRĄD

    Get PDF
    The article presents the discussion on the wording of the new Civil Code of the Czech Republic which becomes effective on January 1, 2014. Some critics claim that the Code contains many newly coined or re-introduced terms which are unknown to the general public and may even feel archaic. Inspired by this debate, a survey was carried out in which a group of students was asked to assess the perceived familiarity with ten terms selected from the new Code and also mark the terms with respect to their perceived stylistic features. All the terms had been analysed with respect to their relative frequency in various text types using the Czech National Corpus. Only one term was assessed as known by more than 40% of the subjects. The same portion of the subjects marked six terms as archaic and five terms as strangely formed. The results show that the debate on the wording was justified. Nevertheless, the requirement for accessibility of legal documents to the general public should be seen with due consideration to various functions, situations and contexts in which individual genres and text types are used.Język nowego czeskiego kodeksu cywilnego, który wejdzie w życie w styczniu 2014, wywołał w Czechach dyskusję. Niektórzy krytycy twierdzą, że kodeks ten zawiera wiele nowo utworzonych lub wprowadzonych ponownie terminów, które są powszechnie niezrozumiałe i mogą brzmieć archaicznie. Debata ta dała powód do przeprowadzenia badania, w którym poproszono studentów o określenie stopnia znajomości wybranych terminów z nowego kodeksu oraz określenie ich cech stylistycznych. Wszytkie terminy zostały przeanalizowane pod kątem częstotliwości występowania przy użyciu Narodowego Korpusu Języka Czeskiego. Tylko jeden z badanych terminów został oceniony jako zrozumiały przez 40% respondentów. Taka sama liczba badanych określiła sześć terminów jako archaiczne i pięć jako utworzone w dziwny sposób. Wyniki pokazują, że debata na temat sfomułowania nowego kodeksu była uzasadniona
    corecore