26 research outputs found

    HiPHET: A Hybrid Approach to Translate Code Mixed Language (Hinglish) to Pure Languages (Hindi and English)

    Get PDF
    Bilingual code mixed (hybrid) languages has become very popular in India as a result of the spread of Western technology in the form of the television, the Internet and social media. Due to this increase in usage of code-mixed languages in day-to-day communication, the need for maintaining the integrity of Indian languages has arisen. As a result of this need the tool named Hinglish to Pure Hindi and English Translator was developed. The tool translated in three ways, namely, Hinglish to Pure Hindi and Pure English, Pure Hindi to Pure English and vice versa. The tool has achieved accuracy of 91% in giving Hindi sentences as output and of 84% in giving English sentences as output, where the input sentences were in Hinglish. The tool has also been compared with another similar tool in the paper

    Survey on Hinglish to English Translation and Classification Techniques

    Get PDF
    Code-mixing is the process of using many languages in one sentence and has a widespread occurrence in multilingual communities. It is particularly prevalent in texts on social media. Due to the widespread usage of social networking sites, a substantial amount of unstructured text is produced. Hinglish, i.e. code-mixed Hindi and English, is a frequent occurrence in everyday language use in India. Hence, a translation process is required to help monolingual users and to aid in the comprehension of language processing models. In this paper, we study the effective techniques for classification and translation tasks and also find gaps and challenges in the current research domain. After comparing a few existing methodologies for machine translation, a framework which showed an improvement in task of translation over the previous methods is proposed. &nbsp

    Identification of monolingual and code-switch information from English-Kannada code-switch data

    Get PDF
    Code-switching is a very common occurrence in social media communication, predominantly found in multilingual countries like India. Using more than one language in communication is known as code-switching or code-mixing. Some of the important applications of code-switch are machine translation (MT), shallow parsing, dialog systems, and semantic parsing. Identifying code-switch and monolingual information is useful for better communication in online networking websites. In this paper, we performed a character level n-gram approach to identify monolingual and code-switch information from English-Kannada social media data. We paralleled various machine learning techniques such as naïve Bayes (NB), support vector classifier (SVC), logistic regression (LR) and neural network (NN) on English-Kannada code-switch (EKCS) data. From the proposed approach, it is observed that the character level n-gram approach provides 1.8% to 4.1% of improvement in terms of Accuracy and 1.6% to 3.8% of improvement in F1-score. Also observed that SVC and NN techniques are outperformed in terms of accuracy (97.9%) and F1-score (98%) with character level n-gram
    corecore