2,658 research outputs found

    HiPHET: A Hybrid Approach to Translate Code Mixed Language (Hinglish) to Pure Languages (Hindi and English)

    Get PDF
    Bilingual code mixed (hybrid) languages has become very popular in India as a result of the spread of Western technology in the form of the television, the Internet and social media. Due to this increase in usage of code-mixed languages in day-to-day communication, the need for maintaining the integrity of Indian languages has arisen. As a result of this need the tool named Hinglish to Pure Hindi and English Translator was developed. The tool translated in three ways, namely, Hinglish to Pure Hindi and Pure English, Pure Hindi to Pure English and vice versa. The tool has achieved accuracy of 91% in giving Hindi sentences as output and of 84% in giving English sentences as output, where the input sentences were in Hinglish. The tool has also been compared with another similar tool in the paper

    THE IDENTIFICATION OF FORMULAIC SEQUENCES IN URDU LANGUAGE AND THEIR PEDAGOGICAL IMPLICATION FOR SLA (ESL/USL)

    Get PDF
    In this study, an effort has been made to explore formulaicity in the Urdu language and its pedagogical implication in second language acquisition, both for English as a second language and Urdu as second language learners. It is believed that formulaic sequences or prefabs make more than fifty percent of a language. These formulaic sequences are of various kinds encompassing idioms, proverbs, collocations and sometimes, simple fillers. For the current study, data will be collected from two widely circulated Urdu newspapers. The data will consist of lexical chunks or formulas, which will be identified on the basis of eleven criteria proposed by Wray and Namba (2003). To maintain inter-rater reliability, the data will be shared with an Urdu language expert. After the identification, the formulaic sequences will be classified into six classes. Results of the pilot study show that there is formulaicity in the Urdu language. It was found that Urdu is also replete with almost all kinds of formulaic sequences, like many other languages

    Класифікація фразеологічних одиниць з зоонімним компонентом у німецькій мові (CLASSIFICATION OF THE PHRASEOLOGICAL UNITS WITH ZOONYM COMPONENT IN THE GERMANL)

    Get PDF
    Стаття присвячена розгляду різних підходів до класифікації й опису фразеологічних одиниць німецької мови з зоонімним компонентом в наявних дослідженнях. Звернено увагу на окремі характеристики фразеологічних одиниць із зоонімним компонентом, описано різні підходи до класифікації досліджуваних фразеологізмів, а також проаналізовано зоокомпоненти німецької мови. Робиться висновок про необхідність подальшого дослідження шляхів і способів утворення зоонімних фразеологічних одиниць та їх характеристик в німецькій мові. (The article is devoted to the investigation of the peculiarities of the German phraseology with zoonym component. The studying of this problem is fulfilling by the subject classification of the phraseology and by the investigation phraseology synonyms and variants with zoonym component in the modern German language. We also proposed the possibly classification of the studied idioms, they are divided into six groups: zoophraseology, ornitophraseology, etnomophraseology, ihtiophraseology, reptiliophraseology, amfibiophraseology. Zoophraseology and ornitophraseology is divided into the phraseology with the names of the wild animals and phraseology with the names of the domestic animals. There are some theoretic problems of the phraseology, which have their choosing in the article, for example it is actual for the identification methods of the phraseology and proverbs in the modern German language. The article considers different approaches to the classification and description of phraseological units of the German language with zoonym component in existing studies. Attention was drawn to the individual characteristics of phraseological units with zoonym component and description of the different approaches to the classification of the studied idioms, and analyzed zoonym component of the German language. The paper sums up the necessity of further study and description of phraseological units and their characteristics in the German language.

    Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

    Full text link
    This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.Comment: 14 pages, 6 figures, see http://airccse.org/journal/jcsit/1011csit05.pd

    The lexico-phraseology of THE and A/AN in spoken English: a corpus-based study

    Get PDF
    The English articles (THE, A, AN) are normally described in terms of the grammar of the language. This is only natural, since they are extremely frequent, fit into certain well-defined syntactic slots, and usually help to communicate only very broad aspects of textual meaning. However, as John Sinclair has pointed out (1999, pp.160-161), the articles are also found as components of many lexico-phraseological units, and in such cases a normal grammatical description may not be of relevance. An example he gives is the presence of A in the phrase 'come to a head', where ‘A has little more status than that of a letter of the alphabet’ (p.161). Sinclair also makes the observation that, ‘I do not know of an estimate of the proportion of instances of A, for example, that are not a realisation of the choice of article but of the realisation of part of a multi-word expression.’ (p.161). The present paper addresses the questions raised by Sinclair, and does so with reference to both the definite and the indefinite article. It focuses, in particular, on the spoken language, and presents the results of analyses of random samples of the articles in the spoken component of the British National Corpus (hereafter BNC-spkn). According to the data in Leech et al (2001, p.144), THE is the most frequent word in BNC-spkn and A is the sixth most frequent (a rank position which remains unaltered when the frequencies of A and AN are combined). Using the BNCweb interface, and specifying that the relevant word forms should be ‘articles’, the total numbers of tokens are: an 19,049; a 200,004; the 409,060. Since the numbers are very high, the samples investigated also contained a reasonably large number of tokens (500). The relative samples corresponded to the following proportions of tokens in BNC-spkn: an 2.62%, a 0.25%, the 0.12%. The latter two are very low percentages, and for this reason, three separate samples of each were investigated, in order to see the extent to which the samples differed. Analysis of article usage was carried out in the first instance by reading right-sorted concordance lines. Whenever doubts arose, larger contexts were retrieved from the corpus. Various reference works were also consulted, including Berry (1993), Francis et al (1998), and various corpus-based dictionaries and grammars. The data presented includes: description of the various types of lexico-phraseological unit found; the proportions of the samples judged to involve the different lexico-phraseological phenomena identified; the problems encountered when deciding whether or not phraseology is an important factor in specific instances of article usage; and the number of tokens in each sample which were in some way irrelevant, for example because they involved speaker repetition of the article, or the non-completion of a noun phrase
    corecore