7,236 research outputs found
Learning Character-level Compositionality with Visual Features
Previous work has modeled the compositionality of words by creating
character-level models of meaning, reducing problems of sparsity for rare
words. However, in many writing systems compositionality has an effect even on
the character-level: the meaning of a character is derived by the sum of its
parts. In this paper, we model this effect by creating embeddings for
characters based on their visual characteristics, creating an image for the
character and running it through a convolutional neural network to produce a
visual character embedding. Experiments on a text classification task
demonstrate that such model allows for better processing of instances with rare
characters in languages such as Chinese, Japanese, and Korean. Additionally,
qualitative analyses demonstrate that our proposed model learns to focus on the
parts of characters that carry semantic content, resulting in embeddings that
are coherent in visual space.Comment: Accepted to ACL 201
An Investigation Into Chinese Internet Neologisms
With the rapid progress of the Internet and easy access to social networking, online communication has become the common occurrence of netizens, hence the spring-up of Internet neologisms. These new expressions mirror emerging phenomena, fresh changes and trending fashions in all aspects of social life as well as play an increasingly important role in social media and people’s daily life. Internet neologisms are full of profound social and cultural connotations, which renders it necessary to make an inquiry into the workings of Chinese Internet neologism (CIN). This paper is designed to make it clear what CIN is, how it is classified, and what are the features and social connotations underlying the seemingly bantering lingoes of CIN
The Generation Study About Netspeak Semantic Vagueness
To study a linguistic phenomenon, you need to know its generation mechanism first. The formation of language and discourse is a really complicated process. It is not only a process to code speech in a multi-angle and multi-level way, but also a result of speakers’ special psychological effect. So it plays an important effect to study the generation of netspeak semantic vagueness. There are both similarities and differences between the generative mechanism of netspeak semantic vagueness and the entitative language. The distinctive and vital factor in generative mechanism of netspeak semantic vagueness is netspeak’s variability which is endowed in the course of its evolving and configurating, not only because it is generated through language mutation, but the metamorphosis pervades each level of netspeak, including lexical, grammar, semantic and so on
SIMPLE MALAY TO ENGLISH TRANSLATOR
In the modern world, there is an increased need for language translation. Attempts of
language translation are as old as computer themselves. Machine translation is the
attempt to automate all, or part of the process of translating from one human language to
another language. Machine Translation involves translating from a source natural
language to a target language. Machine Translation is hard because structures in one
human language often do not correspond in a simple way to structures in another. This
paper represents a prototype of a Simple Malay to English Translator. This translator is
developed to translate simple Malay sentence to English sentence since there is not many
Malay-English translator available. The main tools that will be used for the project
development are Java Language, Forte for Java 4.0 Community Edition and Microsoft
Notepad version 5.1. From the research done, a dictionary that is used for a machine
translator is usually being created in a notepad file for easy retrieval compared to using
Microsoft Access of other database application. The ambiguity problem would not be
addressed in this project. Hence, the goal of the project is to translate syntactically correct
and the semantic factor is not taken into consideration
TEI and LMF crosswalks
The present paper explores various arguments in favour of making the Text
Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO
standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the
issues that would have to be resolved in order to reach an appropriate
implementation of these ideas, in particular in terms of infor-mational
coverage. We show how the customisation facilities offered by the TEI
guidelines can provide an adequate background, not only to cover missing
components within the current Dictionary chapter of the TEI guidelines, but
also to allow specific lexical projects to deal with local constraints. We
expect this proposal to be a basis for a future ISO project in the context of
the on going revision of LMF
ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列
国立大学法人長岡技術科学大
The Use of Abbreviations in English-Medium Astrophysics Research Paper Titles: A Problematic Issue
In this study, we carry out a qualitative and quantitative analysis of abbreviations in 300 randomly collected research paper titles published in the most prestigious European and US-based Astrophysics journals written in English. Our main results show that the process of shortening words and groups of words is one of the most characteristic and recurrent features in Astrophysics research paper titling construction. In spite of the convenience of abbreviations as a mechanism for word-formation, some of them may pose certain difficulties of understanding and/or misinterpretation because of their specificity, ambiguity, or overlapping. To overcome these difficulties, we propose a series of options which with no doubt would lead to a better interaction among the different branches of Astrophysics in particular and of science in general and would definitely improve how research is currently performed and communicated
エンティティ・リンキングのための候補検索とランキング方法に関する研究
Tohoku University乾健太郎課
- …