19 research outputs found

    Bombay

    Get PDF
    Motivated by the fact that automatic analysis of language crucially depends on semantic constituent detection and attachment resolution, we present our work on the problem of generating and linking semantically relatable sets (SRS). These sets are of the form <entity1 entity2> or <entity1 function-word entity2> or <function-word entity>, where the entities can be single words or more complex sentence parts (such as embedded clauses). The challenge lies in finding the components of these sets, which involves solving prepositional phrase (PP) and clause attachment problems, and empty pronominal (PRO) determination. Use is made of (i) the parse tree of the sentence, (ii) the subcategorization frames of lexical items, (iii) the lexical properties of the words and (iv) lexical resources like the WordNet and the Oxford Advanced Learners ’ Dictionary (OALD). The components within the sets and the sets themselves are linked using the semantic relations of an interlingua for machine translation called the Universal Networking Language (UNL). The work forms part of a UNL based MT system, where the source language is analysed into semantic graphs and target language is generated from these graphs. The system has been tested on the Penn Treebank, and the results indicate the promise and effectiveness of our approach

    Parsers know best: German PP attachment revisited

    Get PDF
    In the paper, we revisit the PP attachment problem which has been identified as one of the major sources for parser errors and discuss shortcomings of recent work. In particular, we show that using gold information for the extraction of attachment candidates as well as a missing comparison of the system's output to the output of a full syntactic parser leads to an overly optimistic assessment of the results. We address these issues by presenting a realistic evaluation of the potential of different PP attachment systems, using fully predicted information as system input. We compare our results against the output of a strong neural parser and show that the full parsing approach is superior to modeling PP attachment disambiguation as a separate task

    A New Approach to Journal and Conference Name Disambiguation through K-Means Clustering of Internet and Document Surrogates

    Get PDF
    Bibliometrics has a long history in Information Science. The validity of any bibliometric analysis depends on accurate citations. We introduce an approach that combines author names and Internet document surrogates with K-means clustering to disambiguate journal and conference titles automatically. To evaluate the quality this approach we used records from the Digital Bibliography & Library Project (DBLP). We found there are 2.54±1.52 authors per articles. A manual analysis of 125 articles selected at random from the 1.18 million DBLP citations revealed only seven article pairs from the same publication venue. We describe the changes in cluster properties as the number of articles increases from 100 to 25,000. Our findings suggest that additional features are required to disambiguate journal and conference names accurately. As 60.86% of the DBLP articles are published at conferences future efforts should focus on conference name disambiguation

    A statistical approach to a verb vector task classifier

    Get PDF
    A thesis submitted to the University of Bedfordshire, in fulfilment ofthe requirements for the degree of Master of Science by researchHow to enable a service robot to understand its user's intention is a hot topic of research today. Based on its understanding, the robot can coordinate and adjust its behaviours to provide desired assistance and services to the user as a capable partner. Active Robot Learning (ARL) is an approach to the development of the understanding of human intention. The task action bank is part of the ARL which can store task categories. In this approach, a robot actively performs test actions in order to obtain its user's intention from the user's response to the action. This thesis presents an approach to verbs clustering based on the basic action required of the robot, using a statistical method. A parser is established to process a corpus and analyse the probability of the verb feature vector, for example when the user says "bring me a cup of coffee", this means the same as "give me a cup of coffee". This parser could identify similar verbs between "bring" and "give" with the statistical method. Experimental results show the collocation between semantically related verbs, which can be further utilised to establish a test action bank for Active Robot Learning (ARL)
    corecore