331 research outputs found

    低資源言語としてのベンガル語に対するオントロジーに基づく機械翻訳

    Get PDF
    In this research we propose ontology based Machine Translation with the help of WordNetand UNL Ontology. Example-Based Machine Translation (EBMT) for low resource language,like Bengali, has low-coverage issues. Due to the lack of parallel corpus, it has highprobability of handling unknown words. We have implemented an EBMT system for lowresourcelanguage pair. The EBMT architecture use chunk-string templates (CSTs) andunknown word translation mechanism. CSTs consist of a chunk in source-language, a stringin target-language, and word alignment information. CSTs are prepared automatically fromaligned parallel corpus and WordNet by using English chunker. For unknown wordtranslation, we used WordNet hypernym tree and English-Bengali dictionary. Proposedsystem first tries to find semantically related English words from WordNet for the unknownword. From these related words, we choose the semantically closest related word whoseBangla translation exists in English-Bangla dictionary. If no Bangla translation exists, thesystem uses IPA-based-transliteration. For proper nouns, the system uses Akkhortransliteration mechanism. CSTs improved the wide-coverage by 57 points and quality by48.81 points in human evaluation. Currently 64.29% of the test-set translations by the systemwere acceptable. The combined solutions of CSTs and unknown words generated 67.85%acceptable translations from the test-set. Unknown words mechanism improved translationquality by 3.56 points in human evaluation. This research also proposed the way to autogenerate the explanation of each concept using the semantic backgrounds provided by UNLOntology. These explanations are useful for improving translation quality of unknown words.Ontology Based Machine Translation for Bengali as Low-resource Language.本研究では、WordNet と UNL オントロジーを用いた、オントロジーに基づく機械翻訳を提案する。ベンガル語のような低資源言語 (low-resource language)に対しては、具体例に基づく機械翻訳 (EBMT)は、あまり有効ではない。パラレル・コーパスの欠如のために、多数の未知語を扱わなければならなくなるためである。我々は、低資源言語間の EBMT システムを実装した。実装したEBMT アーキテクチャでは、chunk-string templates (CSTs)と、未知語翻訳メカニズムを用いている。CST は、起点言語のチャンク、目的言語の文字列と、単語アラメント情報から成る。CST は、英語チャンカーを用いて、アラインメント済みのパラレル・コーパスとWordNet から、自動的に生成される。最初に、起点言語のチャンクが OpenNLP チャンカーを用いて自動生成される。そして、初期CST が、各起点言語のチャンクに対して生成され、すべての目的文に対するCSTアラインメントがパラレル・コーパスを用いて生成される。その後、システムは、単語アラインメント情報を用いて、CSTの組合せを生成する。最後に、WordNet を用いて、広い適用範囲を得るためにCST を一般化する。未知語翻訳に対しては、WordNet hypernym treeと、英語・ベンガル語辞書を用いる。提案システムは、最初に、未知語に対して、WordNet から意味的に関連した英単語を発見しようと試みる。これらの関連語から、英語・ベンガル語辞書にベンガル語の翻訳が存在する、意味的に最も近い語を選ぶ。もし、ベンガル語の翻訳が存在しなければ、システムはIPA-based翻訳を行う。固有名詞に対しては、システムは、Akkhor 翻訳メカニズムを用いる。CST は57 ポイントの広い適用範囲を持つように改善され、その際の人間による訳文の評価も 48.81 ポイントを得た。現在、システムのよって、64.29%のテストケースの翻訳が行える。未知語メカニズムは、人間に評価において 3.56 ポイント、翻訳の質を改善した。CST と未知語の組合せよる解法は、テストケースにおいて、67.85%の許容可能な翻訳を生成した。また、本研究では、UNL オントロジーが提供するsemantic background を用いて、各概念に対する説明を自動生成する方法も提案した。このシステムに対する入力は、1つのユニバーサル・ワード(UN)であり、システムの出力はその UN の英語や日本語による説明文である。与えられたUN に対して、システムは、最初に、SemanticWordMap を発見するが、それは、1つの特定のUN に対する、UNL オントロジーからのすべての直接的、間接的参照関係を含む。したがって、このステップの入力は、1つのUN であり、出力はWordMapグラフである。次のステップで、変換規則を用いて、WordMap グラフをUNL に変換する。この変換規則は、ユーザの要求に応じて、“From UWs only”や “From UNL Ontology”と指定できる。したがって、このステップの入力はWordMap グラフであり、出力はUNL表現である。最終ステップでは、UNL DeConverter を用いてUNL 表現を変換し、自然言語を用いて記述する。これらの表現は、未知語に対する翻訳の質の向上に有効であることがわかった。電気通信大学201

    Diagnosing Reading strategies: Paraphrase Recognition

    Get PDF
    Paraphrase recognition is a form of natural language processing used in tutoring, question answering, and information retrieval systems. The context of the present work is an automated reading strategy trainer called iSTART (Interactive Strategy Trainer for Active Reading and Thinking). The ability to recognize the use of paraphrase—a complete, partial, or inaccurate paraphrase; with or without extra information—in the student\u27s input is essential if the trainer is to give appropriate feedback. I analyzed the most common patterns of paraphrase and developed a means of representing the semantic structure of sentences. Paraphrases are recognized by transforming sentences into this representation and comparing them. To construct a precise semantic representation, it is important to understand the meaning of prepositions. Adding preposition disambiguation to the original system improved its accuracy by 20%. The preposition sense disambiguation module itself achieves about 80% accuracy for the top 10 most frequently used prepositions. The main contributions of this work to the research community are the preposition classification and generalized preposition disambiguation processes, which are integrated into the paraphrase recognition system and are shown to be quite effective. The recognition model also forms a significant part of this contribution. The present effort includes the modeling of the paraphrase recognition process, featuring the Syntactic-Semantic Graph as a sentence representation, the implementation of a significant portion of this design demonstrating its effectiveness, the modeling of an effective preposition classification based on prepositional usage, the design of the generalized preposition disambiguation module, and the integration of the preposition disambiguation module into the paraphrase recognition system so as to gain significant improvement

    Um modelo para a extração de conceitos e estabelecimento de contextos em sistemas baseados em conhecimento

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Engenharia de ProduçãoSistemas de Recuperação de Informação normalmente trabalham com tecnologias baseadas em palavras-chave. Embora, tais sistemas atinjam resultados satisfatórios, eles não são aptos a responder consultas mais complexas elaboradas por usuários. Para isto, existem os Sistemas Baseados em Conhecimento, os quais utilizam-se de ontologias para a representação do conhecimento embutido nos textos. As técnicas mais avançadas de construção de ontologias atualmente baseiam-se na participação de três atores: o engenheiro de conhecimento, o especialista do domínio e o analista de sistemas. Este trabalho dispende tempo, haja vista os numerosos estudos que devem ser feitos para determinar quais elementos devem participar da base de conhecimento e como eles se inter-relacionam. Desta forma, utilizar sistemas computacionais que, ao menos, agilizem este trabalho é fundamental para a criação de sistemas para o mercado. Este trabalho apresenta um modelo que permite que a representação do conhecimento seja feita diretamente pelo computador, necessitando de intervenção mínima, ou até nenhuma, do usuário humano, ampliando a abrangência de domínios que um sistema pode manter, tornando-o mais eficiente e de fácil utilização

    RFID Technology in Intelligent Tracking Systems in Construction Waste Logistics Using Optimisation Techniques

    Get PDF
    Construction waste disposal is an urgent issue for protecting our environment. This paper proposes a waste management system and illustrates the work process using plasterboard waste as an example, which creates a hazardous gas when land filled with household waste, and for which the recycling rate is less than 10% in the UK. The proposed system integrates RFID technology, Rule-Based Reasoning, Ant Colony optimization and knowledge technology for auditing and tracking plasterboard waste, guiding the operation staff, arranging vehicles, schedule planning, and also provides evidence to verify its disposal. It h relies on RFID equipment for collecting logistical data and uses digital imaging equipment to give further evidence; the reasoning core in the third layer is responsible for generating schedules and route plans and guidance, and the last layer delivers the result to inform users. The paper firstly introduces the current plasterboard disposal situation and addresses the logistical problem that is now the main barrier to a higher recycling rate, followed by discussion of the proposed system in terms of both system level structure and process structure. And finally, an example scenario will be given to illustrate the system’s utilization

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Spatial Thinking in the Engineering Curriculum: an Investigation of the Relationship Between Problem Solving and Spatial Skills Among Engineering Students.

    Get PDF
    Long considered a primary factor of intelligence, spatial ability has been shown to correlate strongly with success in engineering education, yet is rarely included as a learning outcome in engineering programmes. A clearer understanding of how and why spatial ability impacts on performance in science, technology, engineering and mathematics (STEM) subjects would allow educators to determine if spatial skills development merits greater priority in STEM curricula. The aim of this study is to help inform that debate by shedding new light on the role of spatial thinking in STEM learning and allow teaching practice and curriculum design to be informed by evidence based research. A cross cutting theme in STEM education – problem solving – is examined with respect to its relationship with spatial ability. Several research questions were addressed that related to the role and relevance of spatial ability to first year engineering education and, more specifically, the manner in which spatial ability is manifest in the representation and solution of word story problems in mathematics. Working with samples of engineering students in Ireland and the United States, data were collected in the form of responses to spatial ability tests and problem solving exercises in the areas of mathematics and electric circuits. Following a pilot study to select and refine a set of mathematical story problems a mixed methods design was followed in which data were first analysed using quantitative methods to highlight phenomena that were then explored using an interpretive approach. With regard to engineering education in general, it was found that spatial ability cannot be assumed to improve as students progress through an engineering programme and that spatial ability is highly relevant to assessments that require reasoning about concepts, novel scenarios and problems but can remain hidden in overall course grades possibly due to an emphasis on assessing rote learning. With regard to problem solving, spatial ability was found to have a significant relationship with the problem representation step but not with the solution step. Those with high levels of spatial ability were more able to apply linguistic and schematic knowledge to the problem representation phase which led to higher success rates in translating word statements to mathematical form

    A Frame-Based Approach for Integrating Heterogeneous Knowledge Sources

    Get PDF

    Communication theory and the construction of meaning : a constructive developmental approach

    Get PDF
    In recent years the field of communication has been experiencing a movement toward newer non-tradtional approaches to the study of communication and information. Among these newer approaches is a growing body of research that focuses on interpretive behavior in the communication process. Brenda Dervin\u27s Sense-Making model of communication/information has been the most widely used interpretive theory of information to date. Sense-Making focuses primarily upon the role of the receiver in the communication process and how individuals construct meaning in specific situations. As a result, Sense-Making has not attended adequately to larger shared frameworks of meaning and the effects that they have upon information seeking and use. It is the purpose of this thesis to strengthen Dervin\u27s theory of Sense-Making by gaining a deeper view of the individual in the construction process and yet broadening the meaning making context to include structural concerns. The work of William Perry on cognitive and ethical development will be examined and applied to Sense-Making theory and data to provide a more in-depth understanding of how individuals construct meaning and use information. As a framework for examining shared structures of meaning, James Fowler\u27s theory of faith development has also been applied to Sense-Making theory and data with particular emphasis on relational aspects. These theories are applied to Sense-Making in an effort to develop a more complete view of the individual in the communication process
    corecore