12,684 research outputs found

    Machine Translation using Semantic Web Technologies: A Survey

    Full text link
    A large number of machine translation approaches have recently been developed to facilitate the fluid migration of content across languages. However, the literature suggests that many obstacles must still be dealt with to achieve better automatic translations. One of these obstacles is lexical and syntactic ambiguity. A promising way of overcoming this problem is using Semantic Web technologies. This article presents the results of a systematic review of machine translation approaches that rely on Semantic Web technologies for translating texts. Overall, our survey suggests that while Semantic Web technologies can enhance the quality of machine translation outputs for various problems, the combination of both is still in its infancy.Comment: 23 pages, 2 figures, 4 table

    Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph

    Full text link
    While the volume of scholarly publications has increased at a frenetic pace, accessing and consuming the useful candidate papers, in very large digital libraries, is becoming an essential and challenging task for scholars. Unfortunately, because of language barrier, some scientists (especially the junior ones or graduate students who do not master other languages) cannot efficiently locate the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Hierarchical Representation Learning on Heterogeneous Graph (HRLHG), to address this new problem. HRLHG can learn a representation function by mapping the publications, from multilingual repositories, to a low-dimensional joint embedding space from various kinds of vertexes and relations on a heterogeneous graph. By leveraging both global (task specific) plus local (task independent) information as well as a novel supervised hierarchical random walk algorithm, the proposed method can optimize the publication representations by maximizing the likelihood of locating the important cross-language neighborhoods on the graph. Experiment results show that the proposed method can not only outperform state-of-the-art baseline models, but also improve the interpretability of the representation model for cross-language citation recommendation task.Comment: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018), 635--64

    AppTechMiner: Mining Applications and Techniques from Scientific Articles

    Full text link
    This paper presents AppTechMiner, a rule-based information extraction framework that automatically constructs a knowledge base of all application areas and problem solving techniques. Techniques include tools, methods, datasets or evaluation metrics. We also categorize individual research articles based on their application areas and the techniques proposed/improved in the article. Our system achieves high average precision (~82%) and recall (~84%) in knowledge base creation. It also performs well in application and technique assignment to an individual article (average accuracy ~66%). In the end, we further present two use cases presenting a trivial information retrieval system and an extensive temporal analysis of the usage of techniques and application areas. At present, we demonstrate the framework for the domain of computational linguistics but this can be easily generalized to any other field of research.Comment: JCDL 2017, 6th International Workshop on Mining Scientific Publications. arXiv admin note: substantial text overlap with arXiv:1608.0638

    Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding

    Full text link
    Entity alignment is the task of finding entities in two knowledge bases (KBs) that represent the same real-world object. When facing KBs in different natural languages, conventional cross-lingual entity alignment methods rely on machine translation to eliminate the language barriers. These approaches often suffer from the uneven quality of translations between languages. While recent embedding-based techniques encode entities and relationships in KBs and do not need machine translation for cross-lingual entity alignment, a significant number of attributes remain largely unexplored. In this paper, we propose a joint attribute-preserving embedding model for cross-lingual entity alignment. It jointly embeds the structures of two KBs into a unified vector space and further refines it by leveraging attribute correlations in the KBs. Our experimental results on real-world datasets show that this approach significantly outperforms the state-of-the-art embedding approaches for cross-lingual entity alignment and could be complemented with methods based on machine translation

    Towards an Arabic-English Machine-Translation Based on Semantic Web

    Full text link
    Communication tools make the world like a small village and as a consequence people can contact with others who are from different societies or who speak different languages. This communication cannot happen effectively without Machine Translation because they can be found anytime and everywhere. There are a number of studies that have developed Machine Translation for the English language with so many other languages except the Arabic it has not been considered yet. Therefore we aim to highlight a roadmap for our proposed translation machine to provide an enhanced Arabic English translation based on Semantic.Comment: 6 pages, 4 figures, Conference pape

    Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction

    Full text link
    In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, Description, and Concept), and propose a unified knowledge expression form, SAOKE, to express them. We publicly release a data set which contains more than forty thousand sentences and the corresponding facts in the SAOKE format labeled by crowd-sourcing. To our knowledge, this is the largest publicly available human labeled data set for open information extraction tasks. Using this labeled SAOKE data set, we train an end-to-end neural model using the sequenceto-sequence paradigm, called Logician, to transform sentences into facts. For each sentence, different to existing algorithms which generally focus on extracting each single fact without concerning other possible facts, Logician performs a global optimization over all possible involved facts, in which facts not only compete with each other to attract the attention of words, but also cooperate to share words. An experimental study on various types of open domain relation extraction tasks reveals the consistent superiority of Logician to other states-of-the-art algorithms. The experiments verify the reasonableness of SAOKE format, the valuableness of SAOKE data set, the effectiveness of the proposed Logician model, and the feasibility of the methodology to apply end-to-end learning paradigm on supervised data sets for the challenging tasks of open information extraction

    Sentiment/Subjectivity Analysis Survey for Languages other than English

    Full text link
    Subjective and sentiment analysis have gained considerable attention recently. Most of the resources and systems built so far are done for English. The need for designing systems for other languages is increasing. This paper surveys different ways used for building systems for subjective and sentiment analysis for languages other than English. There are three different types of systems used for building these systems. The first (and the best) one is the language specific systems. The second type of systems involves reusing or transferring sentiment resources from English to the target language. The third type of methods is based on using language independent methods. The paper presents a separate section devoted to Arabic sentiment analysis.Comment: This is an accepted version in Social Network Analysis and Mining journal. The final publication will be available at Springer via http://dx.doi.org/10.1007/s13278-016-0381-

    Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation

    Full text link
    Attention-based sequence-to-sequence model has proved successful in Neural Machine Translation (NMT). However, the attention without consideration of decoding history, which includes the past information in the decoder and the attention mechanism, often causes much repetition. To address this problem, we propose the decoding-history-based Adaptive Control of Attention (ACA) for the NMT model. ACA learns to control the attention by keeping track of the decoding history and the current information with a memory vector, so that the model can take the translated contents and the current information into consideration. Experiments on Chinese-English translation and the English-Vietnamese translation have demonstrated that our model significantly outperforms the strong baselines. The analysis shows that our model is capable of generating translation with less repetition and higher accuracy. The code will be available at https://github.com/lancopk

    Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

    Full text link
    Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often using robust statistical techniques. We describe quantitative and qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9

    Human Translation Vs Machine Translation: the Practitioner Phenomenology

    Full text link
    The paper aimed at exploring the current phenomenon regarding human translation with machine translation. Human translation (HT), by definition, is when a human translator—rather than a machine—translate text. It's the oldest form of translation, relying on pure human intelligence to convert one way of saying things to another. The person who performs language translation. Learn more about using technology to reduce healthcare disparity. A person who performs language translation. The translation is necessary for the spread of information, knowledge, and ideas. It is absolutely necessary for effective and empathetic communication between different cultures. Translation, therefore, is critical for social harmony and peace. Only a human translation can tell the difference because the machine translator will just do the direct word to word translation. This is a hindrance to machines because they are not advanced to the level of rendering these nuances accurately, but they can only do word to word translations. There are different translation techniques, diverse theories about translation and eight different translation services types, including technical translation, judicial translation and certified translation. The translation is the process of translating the sequence of a messenger RNA (mRNA) molecule to a sequence of amino acids during protein synthesis. The genetic code describes the relationship between the sequence of base pairs in a gene and the corresponding amino acid sequence that it encodes
    • …
    corecore