7,236 research outputs found

    Learning Character-level Compositionality with Visual Features

    Full text link
    Previous work has modeled the compositionality of words by creating character-level models of meaning, reducing problems of sparsity for rare words. However, in many writing systems compositionality has an effect even on the character-level: the meaning of a character is derived by the sum of its parts. In this paper, we model this effect by creating embeddings for characters based on their visual characteristics, creating an image for the character and running it through a convolutional neural network to produce a visual character embedding. Experiments on a text classification task demonstrate that such model allows for better processing of instances with rare characters in languages such as Chinese, Japanese, and Korean. Additionally, qualitative analyses demonstrate that our proposed model learns to focus on the parts of characters that carry semantic content, resulting in embeddings that are coherent in visual space.Comment: Accepted to ACL 201

    An Investigation Into Chinese Internet Neologisms

    Get PDF
    With the rapid progress of the Internet and easy access to social networking, online communication has become the common occurrence of netizens, hence the spring-up of Internet neologisms. These new expressions mirror emerging phenomena, fresh changes and trending fashions in all aspects of social life as well as play an increasingly important role in social media and people’s daily life. Internet neologisms are full of profound social and cultural connotations, which renders it necessary to make an inquiry into the workings of Chinese Internet neologism (CIN). This paper is designed to make it clear what CIN is, how it is classified, and what are the features and social connotations underlying the seemingly bantering lingoes of CIN

    The Generation Study About Netspeak Semantic Vagueness

    Get PDF
    To study a linguistic phenomenon, you need to know its generation mechanism first. The formation of language and discourse is a really complicated process. It is not only a process to code speech in a multi-angle and multi-level way, but also a result of speakers’ special psychological effect. So it plays an important effect to study the generation of netspeak semantic vagueness. There are both similarities and differences between the generative mechanism of netspeak semantic vagueness and the entitative language. The distinctive and vital factor in generative mechanism of netspeak semantic vagueness is netspeak’s variability which is endowed in the course of its evolving and configurating, not only because it is generated through language mutation, but the metamorphosis pervades each level of netspeak, including lexical, grammar, semantic and so on

    SIMPLE MALAY TO ENGLISH TRANSLATOR

    Get PDF
    In the modern world, there is an increased need for language translation. Attempts of language translation are as old as computer themselves. Machine translation is the attempt to automate all, or part of the process of translating from one human language to another language. Machine Translation involves translating from a source natural language to a target language. Machine Translation is hard because structures in one human language often do not correspond in a simple way to structures in another. This paper represents a prototype of a Simple Malay to English Translator. This translator is developed to translate simple Malay sentence to English sentence since there is not many Malay-English translator available. The main tools that will be used for the project development are Java Language, Forte for Java 4.0 Community Edition and Microsoft Notepad version 5.1. From the research done, a dictionary that is used for a machine translator is usually being created in a notepad file for easy retrieval compared to using Microsoft Access of other database application. The ambiguity problem would not be addressed in this project. Hence, the goal of the project is to translate syntactically correct and the semantic factor is not taken into consideration

    TEI and LMF crosswalks

    Get PDF
    The present paper explores various arguments in favour of making the Text Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the issues that would have to be resolved in order to reach an appropriate implementation of these ideas, in particular in terms of infor-mational coverage. We show how the customisation facilities offered by the TEI guidelines can provide an adequate background, not only to cover missing components within the current Dictionary chapter of the TEI guidelines, but also to allow specific lexical projects to deal with local constraints. We expect this proposal to be a basis for a future ISO project in the context of the on going revision of LMF

    The language of Keitai-mail: the sociolinguistics of Japanese mobile e-mail

    Get PDF

    ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列

    Get PDF
    国立大学法人長岡技術科学大

    The Use of Abbreviations in English-Medium Astrophysics Research Paper Titles: A Problematic Issue

    Get PDF
    In this study, we carry out a qualitative and quantitative analysis of abbreviations in 300 randomly collected research paper titles published in the most prestigious European and US-based Astrophysics journals written in English. Our main results show that the process of shortening words and groups of words is one of the most characteristic and recurrent features in Astrophysics research paper titling construction. In spite of the convenience of abbreviations as a mechanism for word-formation, some of them may pose certain difficulties of understanding and/or misinterpretation because of their specificity, ambiguity, or overlapping. To overcome these difficulties, we propose a series of options which with no doubt would lead to a better interaction among the different branches of Astrophysics in particular and of science in general and would definitely improve how research is currently performed and communicated
    corecore