1,295 research outputs found

    A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging

    Full text link
    In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimental results on 13 languages show that our approach is fast in terms of training time and tagging speed. Furthermore, our approach obtains very competitive accuracy in comparison to state-of-the-art POS and morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the European Journal on Artificial Intelligence. Version 3: Resubmitted after major revisions. Version 4: Resubmitted after minor revisions. Version 5: to appear in AI Communications (accepted for publication on 3/12/2015

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    A Lexicalized Tree Adjoining Grammar for Thai

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Ambiguous (((Par(t)(it))((ion))(s))(in)) Thai Text

    Get PDF

    Colour Communication Within Different Languages

    Get PDF
    For computational methods aiming to reproduce colour names that are meaningful to speakers of different languages, the mapping between perceptual and linguistic aspects of colour is a problem of central information processing. This thesis advances the field of computational colour communication within different languages in five main directions. First, we show that web-based experimental methodologies offer considerable advantages in obtaining a large number of colour naming responses in British and American English, Greek, Russian, Thai and Turkish. We continue with the application of machine learning methods to discover criteria in linguistic, behavioural and geometric features of colour names that distinguish classes of colours. We show that primary colour terms do not form a coherent class, whilst achromatic and basic classes do. We then propose and evaluate a computational model trained by human responses in the online experiment to automate the assignment of colour names in different languages across the full three-dimensional colour gamut. Fourth, we determine for the first time the location of colour names within a physiologically-based cone excitation space through an unconstrained colour naming experiment using a calibrated monitor under controlled viewing conditions. We show a good correspondence between online and offline datasets; and confirm the validity of both experimental methodologies for estimating colour naming functions in laboratory and real-world monitor settings. Finally, we present a novel information theoretic measure, called dispensability, for colour categories that predicts a gradual scale of basicness across languages from both web- and laboratory- based unconstrained colour naming datasets. As a result, this thesis contributes experimental and computational methodologies towards the development of multilingual colour communication schemes

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
    • …
    corecore