Search CORE

1,323 research outputs found

Semantic Parsing in Limited Resource Conditions

Author: Li Zhuang
Publication venue
Publication date: 14/09/2023
Field of study

This thesis explores challenges in semantic parsing, specifically focusing on scenarios with limited data and computational resources. It offers solutions using techniques like automatic data curation, knowledge transfer, active learning, and continual learning. For tasks with no parallel training data, the thesis proposes generating synthetic training examples from structured database schemas. When there is abundant data in a source domain but limited parallel data in a target domain, knowledge from the source is leveraged to improve parsing in the target domain. For multilingual situations with limited data in the target languages, the thesis introduces a method to adapt parsers using a limited human translation budget. Active learning is applied to select source-language samples for manual translation, maximizing parser performance in the target language. In addition, an alternative method is also proposed to utilize machine translation services, supplemented by human-translated data, to train a more effective parser. When computational resources are limited, a continual learning approach is introduced to minimize training time and computational memory. This maintains the parser's efficiency in previously learned tasks while adapting it to new tasks, mitigating the problem of catastrophic forgetting. Overall, the thesis provides a comprehensive set of methods to improve semantic parsing in resource-constrained conditions.Comment: PhD thesis, year of award 2023, 172 page

arXiv.org e-Print Archive

TINJAUAN HUKUM ISLAM TERHADAP PRAKTIK JUAL BELI BUNGA MAWAR TABUR DENGAN CARA COMOT (Studi Kasus di Pasar Kembang Surakarta)

Author: DIANI YULIDA FANIA
Sirizar Sholakhuddin, H. M. A.
Publication venue
Publication date: 07/05/2020
Field of study

iainska repository

The Circle of Meaning: From Translation to Paraphrasing and Back

Author: Madnani Nitin
Publication venue
Publication date: 01/01/2010
Field of study

The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this goal of more obvious importance than for the tasks of machine translation and paraphrase generation. Preserving meaning between the input and the output is paramount for both, the monolingual vs bilingual distinction notwithstanding. In this thesis, I present a novel, symbiotic relationship between these two tasks that I term the "circle of meaning''. Today's statistical machine translation (SMT) systems require high quality human translations for parameter tuning, in addition to large bi-texts for learning the translation units. This parameter tuning usually involves generating translations at different points in the parameter space and obtaining feedback against human-authored reference translations as to how good the translations. This feedback then dictates what point in the parameter space should be explored next. To measure this feedback, it is generally considered wise to have multiple (usually 4) reference translations to avoid unfair penalization of translation hypotheses which could easily happen given the large number of ways in which a sentence can be translated from one language to another. However, this reliance on multiple reference translations creates a problem since they are labor intensive and expensive to obtain. Therefore, most current MT datasets only contain a single reference. This leads to the problem of reference sparsity---the primary open problem that I address in this dissertation---one that has a serious effect on the SMT parameter tuning process. Bannard and Callison-Burch (2005) were the first to provide a practical connection between phrase-based statistical machine translation and paraphrase generation. However, their technique is restricted to generating phrasal paraphrases. I build upon their approach and augment a phrasal paraphrase extractor into a sentential paraphraser with extremely broad coverage. The novelty in this augmentation lies in the further strengthening of the connection between statistical machine translation and paraphrase generation; whereas Bannard and Callison-Burch only relied on SMT machinery to extract phrasal paraphrase rules and stopped there, I take it a few steps further and build a full English-to-English SMT system. This system can, as expected, ``translate'' any English input sentence into a new English sentence with the same degree of meaning preservation that exists in a bilingual SMT system. In fact, being a state-of-the-art SMT system, it is able to generate n-best "translations" for any given input sentence. This sentential paraphraser, built almost entirely from existing SMT machinery, represents the first 180 degrees of the circle of meaning. To complete the circle, I describe a novel connection in the other direction. I claim that the sentential paraphraser, once built in this fashion, can provide a solution to the reference sparsity problem and, hence, be used to improve the performance a bilingual SMT system. I discuss two different instantiations of the sentential paraphraser and show several results that provide empirical validation for this connection

CiteSeerX

Digital Repository at the University of Maryland

Paraphrasing and Translation

Author: Callison-Burch Chris
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Paraphrasing and translation have previously been treated as unconnected natural lan¬ guage processing tasks. Whereas translation represents the preservation of meaning when an idea is rendered in the words in a different language, paraphrasing represents the preservation of meaning when an idea is expressed using different words in the same language. We show that the two are intimately related. The major contributions of this thesis are as follows:• We define a novel technique for automatically generating paraphrases using bilingual parallel corpora, which are more commonly used as training data for statistical models of translation.• We show that paraphrases can be used to improve the quality of statistical ma¬ chine translation by addressing the problem of coverage and introducing a degree of generalization into the models.• We explore the topic of automatic evaluation of translation quality, and show that the current standard evaluation methodology cannot be guaranteed to correlate with human judgments of translation quality.Whereas previous data-driven approaches to paraphrasing were dependent upon either data sources which were uncommon such as multiple translation of the same source text, or language specific resources such as parsers, our approach is able to harness more widely parallel corpora and can be applied to any language which has a parallel corpus. The technique was evaluated by replacing phrases with their para¬ phrases, and asking judges whether the meaning of the original phrase was retained and whether the resulting sentence remained grammatical. Paraphrases extracted from a parallel corpus with manual alignments are judged to be accurate (both meaningful and grammatical) 75% of the time, retaining the meaning of the original phrase 85% of the time. Using automatic alignments, meaning can be retained at a rate of 70%.Being a language independent and probabilistic approach allows our method to be easily integrated into statistical machine translation. A paraphrase model derived from parallel corpora other than the one used to train the translation model can be used to increase the coverage of statistical machine translation by adding translations of previously unseen words and phrases. If the translation of a word was not learned, but a translation of a synonymous word has been learned, then the word is paraphrased and its paraphrase is translated. Phrases can be treated similarly. Results show that augmenting a state-of-the-art SMT system with paraphrases in this way leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs, we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches

CiteSeerX

Edinburgh Research Archive

Investigating translation strategies in Indonesian best seller novel

Author: Daud Bukhari
Fata Ika Apriani
Jannah Miftahul
Muktabar Fadhilah
Wahyuni Sri
Publication venue: 'Universitas Islam Negeri Ar-Raniry'
Publication date: 30/04/2022
Field of study

Translation strategies have been the subject of extensive investigation. Most people believe that translators use specific strategies and that basic translation strategies are sometimes insufficient. As a result, numerous scholars have investigated and analyzed various translation techniques from various perspectives. This study determined the translation strategies in the novel of Negeri 5 Menara and its English Version, The Land of Five Towers using Baker's (2011) framework. This study was conducted using a descriptive qualitative technique to determine the translation strategies in Negeri 5 Menara and its English version, The Land of Five Towers. There were 130 data points in all. According to the findings, 11% about the use of the more general word, 14 % in the use of the more neutral or expensive word, 8% of cultural substitution, 5% of loan words, 4% of omission, paraphrase with related terms accounted for 57% of all translation tactics, while paraphrasing with unrelated words accounted for 2%, and there was no data on illustration. There were 21 uncategorized data points for every given strategy. It was predicted that in the future, a translator, who is also a pre-service teacher, should widen his or her translation methodologies in order to combat non-equivalence translation

Pusat Jurnal UIN Ar-Raniry (Universitas Islam Negeri)

Recommended from our members

Adapting Automatic Summarization to New Sources of Information

Author: Ouyang Jessica Jin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

English-language news articles are no longer necessarily the best source of information. The Web allows information to spread more quickly and travel farther: first-person accounts of breaking news events pop up on social media, and foreign-language news articles are accessible to, if not immediately understandable by, English-speaking users. This thesis focuses on developing automatic summarization techniques for these new sources of information. We focus on summarizing two specific new sources of information: personal narratives, first-person accounts of exciting or unusual events that are readily found in blog entries and other social media posts, and non-English documents, which must first be translated into English, often introducing translation errors that complicate the summarization process. Personal narratives are a very new area of interest in natural language processing research, and they present two key challenges for summarization. First, unlike many news articles, whose lead sentences serve as summaries of the most important ideas in the articles, personal narratives provide no such shortcuts for determining where important information occurs in within them; second, personal narratives are written informally and colloquially, and unlike news articles, they are rarely edited, so they require heavier editing and rewriting during the summarization process. Non-English documents, whether news or narrative, present yet another source of difficulty on top of any challenges inherent to their genre: they must be translated into English, potentially introducing translation errors and disfluencies that must be identified and corrected during summarization. The bulk of this thesis is dedicated to addressing the challenges of summarizing personal narratives found on the Web. We develop a two-stage summarization system for personal narrative that first extracts sentences containing important content and then rewrites those sentences into summary-appropriate forms. Our content extraction system is inspired by contextualist narrative theory, using changes in writing style throughout a narrative to detect sentences containing important information; it outperforms both graph-based and neural network approaches to sentence extraction for this genre. Our paraphrasing system rewrites the extracted sentences into shorter, standalone summary sentences, learning to mimic the paraphrasing choices of human summarizers more closely than can traditional lexicon- or translation-based paraphrasing approaches. We conclude with a chapter dedicated to summarizing non-English documents written in low-resource languages – documents that would otherwise be unreadable for English-speaking users. We develop a cross-lingual summarization system that performs even heavier editing and rewriting than does our personal narrative paraphrasing system; we create and train on large amounts of synthetic errorful translations of foreign-language documents. Our approach produces fluent English summaries from disdisfluent translations of non-English documents, and it generalizes across languages

Columbia University Academic Commons

Understanding and Enhancing the Use of Context for Machine Translation

Author: Fadaee Marzieh
Publication venue
Publication date: 01/01/2020
Field of study

To understand and infer meaning in language, neural models have to learn complicated nuances. Discovering distinctive linguistic phenomena from data is not an easy task. For instance, lexical ambiguity is a fundamental feature of language which is challenging to learn. Even more prominently, inferring the meaning of rare and unseen lexical units is difficult with neural networks. Meaning is often determined from context. With context, languages allow meaning to be conveyed even when the specific words used are not known by the reader. To model this learning process, a system has to learn from a few instances in context and be able to generalize well to unseen cases. The learning process is hindered when training data is scarce for a task. Even with sufficient data, learning patterns for the long tail of the lexical distribution is challenging. In this thesis, we focus on understanding certain potentials of contexts in neural models and design augmentation models to benefit from them. We focus on machine translation as an important instance of the more general language understanding problem. To translate from a source language to a target language, a neural model has to understand the meaning of constituents in the provided context and generate constituents with the same meanings in the target language. This task accentuates the value of capturing nuances of language and the necessity of generalization from few observations. The main problem we study in this thesis is what neural machine translation models learn from data and how we can devise more focused contexts to enhance this learning. Looking more in-depth into the role of context and the impact of data on learning models is essential to advance the NLP field. Moreover, it helps highlight the vulnerabilities of current neural networks and provides insights into designing more robust models.Comment: PhD dissertation defended on November 10th, 202

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Kulttuurisidonnaisten elementtien kääntäminen elokuvassa Zootropolis

Author: Häppölä Tiia
Publication venue: Helsingfors universitet
Publication date: 01/01/2021
Field of study

Tämän maisterintutkielman päämääränä on tutkia, kuinka kulttuurisidonnaiset elementit on käännetty Disneyn Zootropolis-elokuvassa. Tärkeimpinä lähteinä ovat Fredric Chaume (2012) dubbausosiossa, Lawrence Venuti (1995) kotouttamisessa, Ritva Leppihalmeen teos (1997) alluusioiden kääntämisestä ja Jan Pedersenin (2011) teoria kielen ulkopuolisista kulttuuriviittauksista. Tutkielman materiaalina ovat Zootropoliksen alkuperäinen englanninkielinen versio ja sen suomenkielinen käännös. Elokuvassa esiintyvät kulttuurisidonnaiset elementit on listattu ja jaettu kuuteen kategoriaan: nimet, lempinimet ja haukkumanimet, puhuttelut, instituutiot, ammatit ja yhteiskunta, idiomit ja puhekielisyydet, yleinen kulttuuritietous ja viittaukset pop-kulttuuriin. Tulosten perusteella voidaan sanoa, että kääntäjä ei ole noudattanut yhtä globaalia käännösstrategiaa, vaan jokainen kulttuurisidonnainen elementti on käännetty tilannekohtaisesti, välillä kotouttavalla ja välillä vieraannuttavalla strategialla. Käytetyimmät strategiat olivat tilannekohtainen korvaus ja suora käännös, jotka jakaantuivat melko tasaisesti eri kategorioiden kesken. Suurimmat erot olivat kategoriassa instituutiot, ammatit ja yhteiskunta, jossa suora käännös oli selkeästi yleisin strategia ja korvausta käytettiin hyvin vähän, sekä kategoriassa idiomit ja puhekielisyydet, jossa tilannekohtainen korvaus oli selkeästi yleisin ja suoraa käännöstä käytettiin todella vähän

Helsingin yliopiston digitaalinen arkisto

Explicit Sentence Compression for Neural Machine Translation

Author: Chen Kehai
Li Zuchao
Sumita Eiichiro
Utiyama Masao
Wang Rui
Zhang Zhuosheng
Zhao Hai
Publication venue
Publication date: 26/12/2019
Field of study

State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.Comment: Working in progress, part of this work is accepted in AAAI-202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications