710 research outputs found

    The logic and linguistic model for automatic extraction of collocation similarity

    Get PDF
    The article discusses the process of automatic identification of collocation similarity. The semantic analysis is one of the most advanced as well as the most difficult NLP task. The main problem of semantic processing is the determination of polysemy and synonymy of linguistic units. In addition, the task becomes complicated in case of word collocations. The paper suggests a logical and linguistic model for automatic determining semantic similarity between colocations in Ukraine and English languages. The proposed model formalizes semantic equivalence of collocations by means of semantic and grammatical characteristics of collocates. The basic idea of this approach is that morphological, syntactic and semantic characteristics of lexical units are to be taken into account for the identification of collocation similarity. Basic mathematical means of our model are logical-algebraic equations of the finite predicates algebra. Verb-noun and noun-adjective collocations in Ukrainian and English languages consist of words belonged to main parts of speech. These collocations are examined in the model. The model allows extracting semantically equivalent collocations from semi-structured and non-structured texts. Implementations of the model will allow to automatically recognize semantically equivalent collocations. Usage of the model allows increasing the effectiveness of natural language processing tasks such as information extraction, ontology generation, sentiment analysis and some others

    A Hybrid Extraction Model for Chinese Noun/Verb Synonymous bi-gram Collocations

    Get PDF

    Acquisition semi-automatique de collocations à partir de corpus monolingues et multilingues comparables

    No full text
    International audienceCet article présente une méthode d'acquisition semi-automatique de colloca- tions. Notre extraction monolingue estime pour chaque co-occurrence sa capacité à être une collocation, d'après une mesure statistique modélisant une caractéristique essentielle (le fait qu'une collocation se produit plus souvent que par hasard), effectue ensuite un filtrage auto- matique (en utilisant les vecteurs conceptuels) pour ne retenir que des collocations d'un certain type sémantique, puis effectue enfin un nouveau filtrage à partir de données entrées manuel- lement. Notre extraction bilingue est effectuée à partir de corpus comparables, et a pour but d'extraire des collocations qui ne soient pas forcément traductions mot-à-mot l'une de l'autre. Notre évaluation démontre l'intérêt de mêler extraction automatique et intervention manuelle pour acquérir des collocations et ainsi permettre de compléter les bases lexicales multilingues

    A hybrid extraction model for Chinese noun/verb synonym bi-gram

    Get PDF
    2011-2012 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

    The Semantic Prosody of Natural Phenomena in the Qur’an: A Corpus-Based Study

    Get PDF
    This thesis explores the Semantic Prosody (SP) of natural phenomena in the Qur’an and five of its prominent English translations [Pickthall (1930), Yusuf Ali (1939/ revised edition 1987), Arberry (1957), Saheeh International (1997), and Abdel Haleem (2004)]. SP, scarcely explored in Qur’anic research, is defined as ‘a form of meaning established through the proximity of a consistent series of collocates’ (Louw 2000, p.50). Theoretically, it is both an evaluative prosody (i.e., lexical items collocating with semantic word classes that are positive, negative, or neutral) and a discourse prosody (i.e., having a communicative purpose). Given the stylistic uniqueness of the Qur’an and considering that SP can be examined empirically via corpora, the present study explores the SP of 154 words associated with nature referenced throughout the Qur’an using Corpus Linguistics techniques. Firstly, the Python-based Natural Language Toolkit was used for the following: to define nature terms via WordNet; to disambiguate their variant forms with Stemmers, and to compute their frequencies. Once frequencies were found, a quantitative analysis using Evert’s (2008) five-step statistical analysis was implemented on the 30 most frequent terms to investigate their collocations and SPs. Following this, a qualitative analysis was conducted as per the Extended Lexical Unit via concordance to analyse collocations and the Lexical-Functional Grammar to find the variation of meanings produced by lexico-grammatical patterns. Finally, the resulting datasets were aligned to evaluate their congruency with the Qur’an. Findings of this research confirm that words referring to nature in the Qur’an do have semantic prosody. For example, astronomical bodies are primed to occur in predominantly positive collocations referring to glorifying God, while weather phenomena in negative ones refer to Day of Judgment calamities. In addition, results show that Abdel-Haleem’s translation can be considered the most congruent. This research develops an approach to explore themes (e.g., nature) via SP analysis in texts and their translations and provides several linguistic resources that can be used for future corpus-based studies on the language of the Qur’an.
    corecore