Search CORE

12,684 research outputs found

Machine Translation using Semantic Web Technologies: A Survey

Author: Moussallem Diego
Ngomo Axel-Cyrille Ngonga
Wauer Matthias
Publication venue: 'Elsevier BV'
Publication date: 17/07/2018
Field of study

A large number of machine translation approaches have recently been developed to facilitate the fluid migration of content across languages. However, the literature suggests that many obstacles must still be dealt with to achieve better automatic translations. One of these obstacles is lexical and syntactic ambiguity. A promising way of overcoming this problem is using Semantic Web technologies. This article presents the results of a systematic review of machine translation approaches that rely on Semantic Web technologies for translating texts. Overall, our survey suggests that while Semantic Web technologies can enhance the quality of machine translation outputs for various problems, the combination of both is still in its infancy.Comment: 23 pages, 2 figures, 4 table

arXiv.org e-Print Archive

Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph

Author: Gao Liangcai
Jiang Zhuoren
Liu Xiaozhong
Lu Yao
Yin Yue
Publication venue
Publication date: 31/12/2018
Field of study

While the volume of scholarly publications has increased at a frenetic pace, accessing and consuming the useful candidate papers, in very large digital libraries, is becoming an essential and challenging task for scholars. Unfortunately, because of language barrier, some scientists (especially the junior ones or graduate students who do not master other languages) cannot efficiently locate the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Hierarchical Representation Learning on Heterogeneous Graph (HRLHG), to address this new problem. HRLHG can learn a representation function by mapping the publications, from multilingual repositories, to a low-dimensional joint embedding space from various kinds of vertexes and relations on a heterogeneous graph. By leveraging both global (task specific) plus local (task independent) information as well as a novel supervised hierarchical random walk algorithm, the proposed method can optimize the publication representations by maximizing the likelihood of locating the important cross-language neighborhoods on the graph. Experiment results show that the proposed method can not only outperform state-of-the-art baseline models, but also improve the interpretability of the representation model for cross-language citation recommendation task.Comment: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018), 635--64

arXiv.org e-Print Archive

AppTechMiner: Mining Applications and Techniques from Scientific Articles

Author: Agarwal Sanyam
Dan Soham
Goyal Pawan
Mukherjee Animesh
Singh Mayank
Publication venue
Publication date: 10/11/2017
Field of study

This paper presents AppTechMiner, a rule-based information extraction framework that automatically constructs a knowledge base of all application areas and problem solving techniques. Techniques include tools, methods, datasets or evaluation metrics. We also categorize individual research articles based on their application areas and the techniques proposed/improved in the article. Our system achieves high average precision (~82%) and recall (~84%) in knowledge base creation. It also performs well in application and technique assignment to an individual article (average accuracy ~66%). In the end, we further present two use cases presenting a trivial information retrieval system and an extensive temporal analysis of the usage of techniques and application areas. At present, we demonstrate the framework for the domain of computational linguistics but this can be easily generalized to any other field of research.Comment: JCDL 2017, 6th International Workshop on Mining Scientific Publications. arXiv admin note: substantial text overlap with arXiv:1608.0638

arXiv.org e-Print Archive

Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding

Author: D Spohr
J Duchi
P Ristoski
Y Hao
Publication venue
Publication date: 25/09/2017
Field of study

Entity alignment is the task of finding entities in two knowledge bases (KBs) that represent the same real-world object. When facing KBs in different natural languages, conventional cross-lingual entity alignment methods rely on machine translation to eliminate the language barriers. These approaches often suffer from the uneven quality of translations between languages. While recent embedding-based techniques encode entities and relationships in KBs and do not need machine translation for cross-lingual entity alignment, a significant number of attributes remain largely unexplored. In this paper, we propose a joint attribute-preserving embedding model for cross-lingual entity alignment. It jointly embeds the structures of two KBs into a unified vector space and further refines it by leveraging attribute correlations in the KBs. Our experimental results on real-world datasets show that this approach significantly outperforms the state-of-the-art embedding approaches for cross-lingual entity alignment and could be complemented with methods based on machine translation

arXiv.org e-Print Archive

Crossref

Towards an Arabic-English Machine-Translation Based on Semantic Web

Author: Al-Baltah Ibrahim Ahmed
Al-gapheri Ghaleb H.
Ba-Alwi Fadl Mutaher
Dahan Neama Abdulaziz
Publication venue
Publication date: 14/09/2017
Field of study

Communication tools make the world like a small village and as a consequence people can contact with others who are from different societies or who speak different languages. This communication cannot happen effectively without Machine Translation because they can be found anytime and everywhere. There are a number of studies that have developed Machine Translation for the English language with so many other languages except the Arabic it has not been considered yet. Therefore we aim to highlight a roadmap for our proposed translation machine to provide an enhanced Arabic English translation based on Semantic.Comment: 6 pages, 4 figures, Conference pape

arXiv.org e-Print Archive

Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction

Author: Fan Miao
Feng Yue
Li Ping
Li Xu
Sun Mingming
Wang Xin
Publication venue
Publication date: 29/04/2019
Field of study

In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, Description, and Concept), and propose a unified knowledge expression form, SAOKE, to express them. We publicly release a data set which contains more than forty thousand sentences and the corresponding facts in the SAOKE format labeled by crowd-sourcing. To our knowledge, this is the largest publicly available human labeled data set for open information extraction tasks. Using this labeled SAOKE data set, we train an end-to-end neural model using the sequenceto-sequence paradigm, called Logician, to transform sentences into facts. For each sentence, different to existing algorithms which generally focus on extracting each single fact without concerning other possible facts, Logician performs a global optimization over all possible involved facts, in which facts not only compete with each other to attract the attention of words, but also cooperate to share words. An experimental study on various types of open domain relation extraction tasks reveals the consistent superiority of Logician to other states-of-the-art algorithms. The experiments verify the reasonableness of SAOKE format, the valuableness of SAOKE data set, the effectiveness of the proposed Logician model, and the feasibility of the methodology to apply end-to-end learning paradigm on supervised data sets for the challenging tasks of open information extraction

arXiv.org e-Print Archive

Sentiment/Subjectivity Analysis Survey for Languages other than English

Author: Aljadda Khalifeh
Crandall David
Korayem Mohammed
Publication venue
Publication date: 25/08/2016
Field of study

Subjective and sentiment analysis have gained considerable attention recently. Most of the resources and systems built so far are done for English. The need for designing systems for other languages is increasing. This paper surveys different ways used for building systems for subjective and sentiment analysis for languages other than English. There are three different types of systems used for building these systems. The first (and the best) one is the language specific systems. The second type of systems involves reusing or transferring sentiment resources from English to the target language. The third type of methods is based on using language independent methods. The paper presents a separate section devoted to Arabic sentiment analysis.Comment: This is an accepted version in Social Network Analysis and Mining journal. The final publication will be available at Springer via http://dx.doi.org/10.1007/s13278-016-0381-

arXiv.org e-Print Archive

Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation

Author: Lin Junyang
Ma Shuming
Su Qi
Sun Xu
Publication venue
Publication date: 06/02/2018
Field of study

Attention-based sequence-to-sequence model has proved successful in Neural Machine Translation (NMT). However, the attention without consideration of decoding history, which includes the past information in the decoder and the attention mechanism, often causes much repetition. To address this problem, we propose the decoding-history-based Adaptive Control of Attention (ACA) for the NMT model. ACA learns to control the attention by keeping track of the decoding history and the current information with a memory vector, so that the model can take the translated contents and the current information into consideration. Experiments on Chinese-English translation and the English-Vietnamese translation have demonstrated that our model significantly outperforms the strong baselines. The analysis shows that our model is capable of generating translation with less repetition and higher accuracy. The code will be available at https://github.com/lancopk

arXiv.org e-Print Archive

Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

Author: Chander Ishwar
Haines Matthew
Hatzivassiloglou Vasileios
Hovy Eduard
Iida Masayo
Knight Kevin
Luk Steve K.
Whitney Richard
Yamada Kenji
Publication venue
Publication date: 01/01/1995
Field of study

Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often using robust statistical techniques. We describe quantitative and qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9

arXiv.org e-Print Archive

CiteSeerX

Human Translation Vs Machine Translation: the Practitioner Phenomenology

Author: Xeauyin L. (Liming)
Xiu P. (Peng)
Publication venue: American Linguist Association
Publication date: 01/01/2018
Field of study

The paper aimed at exploring the current phenomenon regarding human translation with machine translation. Human translation (HT), by definition, is when a human translator—rather than a machine—translate text. It's the oldest form of translation, relying on pure human intelligence to convert one way of saying things to another. The person who performs language translation. Learn more about using technology to reduce healthcare disparity. A person who performs language translation. The translation is necessary for the spread of information, knowledge, and ideas. It is absolutely necessary for effective and empathetic communication between different cultures. Translation, therefore, is critical for social harmony and peace. Only a human translation can tell the difference because the machine translator will just do the direct word to word translation. This is a hindrance to machines because they are not advanced to the level of rendering these nuances accurately, but they can only do word to word translations. There are different translation techniques, diverse theories about translation and eight different translation services types, including technical translation, judicial translation and certified translation. The translation is the process of translating the sequence of a messenger RNA (mRNA) molecule to a sequence of amino acids during protein synthesis. The genetic code describes the relationship between the sequence of base pairs in a gene and the corresponding amino acid sequence that it encodes

Neliti