22 research outputs found
Analogical Learner For Natural Language Processing Based On Structured String-tree Correspondence (sstc) And Case-based Reasoning
Mesin terjemahan melalui contoh menggunakan contoh penterjemahan yang seiras yang didapati daripada bank pengetahuan dua bahasa (BKB).
Example-Based Machine Translation (EBMT) is using the similar translation examples which are retrieved from the Bilingual Knowledge Bank (BKB) to translate an input sentence
Adapting An Existing Example-Based Machine Translation (EBMT) System For New Language Pairs Based On An Optimized Bilingual Knowledge Bank (BKB).
Sourcing for large amount of text and translating them are some of the challenges in building an Example-Based Machine Translation (EBMT) system. These big amounts of translated texts are annotated into the S-SSTC format to cover an extensive vocabulary and sentence structures. However, the Bilingual Knowledge Bank (BKB), which is a collection of the S-SSTCs, will normally contain redundancy. Hence, the
idea of an optimized BKB is born. An optimized BKB (redundancy reduced; is smaller in size but is as equally extensive in term of its sentence structure coverage
compared to an un-optimized BKB. Therefore, an optimized BKB enhances the performance of the EBMT. In this paper, we introduce the idea of an optimized BKB
and propose it to be re-used to effectively construct new BKBs in order to adapt an existing EBMT for new language pairs
ANNOTATED DISJUNCT FOR MACHINE TRANSLATION
Most information found in the Internet is available in English version. However,
most people in the world are non-English speaker. Hence, it will be of great advantage
to have reliable Machine Translation tool for those people. There are many
approaches for developing Machine Translation (MT) systems, some of them are
direct, rule-based/transfer, interlingua, and statistical approaches. This thesis focuses
on developing an MT for less resourced languages i.e. languages that do not have
available grammar formalism, parser, and corpus, such as some languages in South
East Asia. The nonexistence of bilingual corpora motivates us to use direct or transfer
approaches. Moreover, the unavailability of grammar formalism and parser in the
target languages motivates us to develop a hybrid between direct and transfer
approaches. This hybrid approach is referred as a hybrid transfer approach. This
approach uses the Annotated Disjunct (ADJ) method. This method, based on Link
Grammar (LG) formalism, can theoretically handle one-to-one, many-to-one, and
many-to-many word(s) translations. This method consists of transfer rules module
which maps source words in a source sentence (SS) into target words in correct
position in a target sentence (TS). The developed transfer rules are demonstrated on
English → Indonesian translation tasks. An experimental evaluation is conducted to
measure the performance of the developed system over available English-Indonesian
MT systems. The developed ADJ-based MT system translated simple, compound, and
complex English sentences in present, present continuous, present perfect, past, past
perfect, and future tenses with better precision than other systems, with the accuracy
of 71.17% in Subjective Sentence Error Rate metric
ANNOTATED DISJUNCT FOR MACHINE TRANSLATION
Most information found in the Internet is available in English version. However,
most people in the world are non-English speaker. Hence, it will be of great advantage
to have reliable Machine Translation tool for those people. There are many
approaches for developing Machine Translation (MT) systems, some of them are
direct, rule-based/transfer, interlingua, and statistical approaches. This thesis focuses
on developing an MT for less resourced languages i.e. languages that do not have
available grammar formalism, parser, and corpus, such as some languages in South
East Asia. The nonexistence of bilingual corpora motivates us to use direct or transfer
approaches. Moreover, the unavailability of grammar formalism and parser in the
target languages motivates us to develop a hybrid between direct and transfer
approaches. This hybrid approach is referred as a hybrid transfer approach. This
approach uses the Annotated Disjunct (ADJ) method. This method, based on Link
Grammar (LG) formalism, can theoretically handle one-to-one, many-to-one, and
many-to-many word(s) translations. This method consists of transfer rules module
which maps source words in a source sentence (SS) into target words in correct
position in a target sentence (TS). The developed transfer rules are demonstrated on
English → Indonesian translation tasks. An experimental evaluation is conducted to
measure the performance of the developed system over available English-Indonesian
MT systems. The developed ADJ-based MT system translated simple, compound, and
complex English sentences in present, present continuous, present perfect, past, past
perfect, and future tenses with better precision than other systems, with the accuracy
of 71.17% in Subjective Sentence Error Rate metric
Using semantic clustering to support situation awareness on Twitter: The case of World Views
In recent years, situation awareness has been recognised as a critical part of effective decision making, in particular for crisis management. One way to extract value and allow for better situation awareness is to develop a system capable of analysing a dataset of multiple posts, and clustering consistent posts into different views or stories (or, world views). However, this can be challenging as it requires an understanding of the data, including determining what is consistent data, and what data corroborates other data. Attempting to address these problems, this article proposes Subject-Verb-Object Semantic Suffix Tree Clustering (SVOSSTC) and a system to support it, with a special focus on Twitter content. The novelty and value of SVOSSTC is its emphasis on utilising the Subject-Verb-Object (SVO) typology in order to construct semantically consistent world views, in which individuals---particularly those involved in crisis response---might achieve an enhanced picture of a situation from social media data. To evaluate our system and its ability to provide enhanced situation awareness, we tested it against existing approaches, including human data analysis, using a variety of real-world scenarios. The results indicated a noteworthy degree of evidence (e.g., in cluster granularity and meaningfulness) to affirm the suitability and rigour of our approach. Moreover, these results highlight this article's proposals as innovative and practical system contributions to the research field
Idiom treatment experiments in machine translation
Idiomatic expressions pose a particular challenge for the today\u27;s Machine Translation systems, because their translation mostly does not result literally, but logically. The present dissertation shows, how with the help of a corpus, and morphosyntactic rules, such idiomatic expressions can be recognized and finally correctly translated. The work leads the reader in the first chapter generally to the field of Machine Translation and following that, it focuses on the special field of Example-based Machine Translation. Next, an important part of the doctoral thesis dissertation is devoted to the theory of idiomatic expressions. The practical part of the thesis describes how the hybrid Example-based Machine Translation system METIS-II, with the help of morphosyntactic rules, is able to correctly process certain idiomatic expressions and finally, to translate them. The following chapter deals with the function of the transfer system CAT2 and its handling of the idiomatic expressions. The last part of the thesis includes the evaluation of three commercial systems, namely SYSTRAN, T1 Langenscheidt, and Power Translator Pro, with respect to continuous and discontinuous idiomatic expressions. For this, both small corpora and a part of the extensive corpus Europarl and the Digital Lexicon of the German Language in 20th century were processed, firstly manually and then automatically. The dissertation concludes with results from this evaluation.Idiomatische Redewendungen stellen für heutige maschinelle Übersetzungssysteme eine besondere Herausforderung dar, da ihre Übersetzung nicht wörtlich, sondern stets sinngemäß erfolgen muss. Die vorliegende Dissertation zeigt, wie mit Hilfe eines Korpus sowie morphosyntaktischer Regeln solche idiomatische Redewendungen erkannt und am Ende richtig übersetzt werden können. Die Arbeit führt den Leser im ersten Kapitel allgemein in das Gebiet der Maschinellen Übersetzung ein und vertieft im Anschluss daran das Spezialgebiet der Beispielbasierten Maschinellen Übersetzung. Im Folgenden widmet sich ein wesentlicher Teil der Doktorarbeit der Theorie über idiomatische Redewendungen. Der praktische Teil der Arbeit beschreibt wie das hybride Beispielbasierte Maschinelle Übersetzungssystem METIS-II mit Hilfe von morphosyntaktischen Regeln befähigt wurde, bestimmte idiomatische Redewendungen korrekt zu bearbeiten und am Ende zu übersetzen. Das nachfolgende Kapitel behandelt die Funktion des Transfersystems CAT2 und dessen Umgang mit idiomatischen Wendungen. Der letzte Teil der Arbeit beinhaltet die Evaluation von drei kommerzielle Systemen, nämlich SYSTRAN, T1 Langenscheidt und Power Translator Pro, in Bezug auf deren Umgang mit kontinuierlichen und diskontinuierlichen idiomatischen Redewendungen. Hierzu wurden sowohl kleine Korpora als auch ein Teil des umfangreichen Korpus Europarl und des Digatalen Wörterbuchs der deutschen Sprache des 20. Jh. erst manuell und dann maschinell bearbeitet. Die Dissertation wird mit Folgerungen aus der Evaluation abgeschlossen
Towards Interoperable Research Infrastructures for Environmental and Earth Sciences
This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions