24 research outputs found

    Adaptive Machine Translation with Large Language Models

    Get PDF
    Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and adapt to corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, real-time adaptation remains challenging. Large-scale language models (LLMs) have recently shown interesting capabilities of in-context learning, where they learn to replicate certain input-output text generation patterns, without further fine-tuning. By feeding an LLM at inference time with a prompt that consists of a list of translation pairs, it can then simulate the domain and style characteristics. This work aims to investigate how we can utilize in-context learning to improve real-time adaptive MT. Our extensive experiments show promising results at translation time. For example, GPT-3.5 can adapt to a set of in-domain sentence pairs and/or terminology while translating a new sentence. We observe that the translation quality with few-shot in-context learning can surpass that of strong encoder-decoder MT systems, especially for high-resource languages. Moreover, we investigate whether we can combine MT from strong encoder-decoder models with fuzzy matches, which can further improve translation quality, especially for less supported languages. We conduct our experiments across five diverse language pairs, namely English-to-Arabic (EN-AR), English-to-Chinese (EN-ZH), English-to-French (EN-FR), English-to-Kinyarwanda (EN-RW), and English-to-Spanish (EN-ES)

    Domain-specific text generation for machine translation

    Get PDF
    Preservation of domain knowledge from the source to target is crucial in any translation workflow. It is common in the translation industry to receive highly specialized projects, where there is hardly any parallel in-domain data. In such scenarios where there is insufficient in-domain data to fine-tune Machine Translation (MT) models, producing translations that are consistent with the relevant context is challenging. In this work, we propose a novel approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation for MT, simulating the domain characteristics of either (a) a small bilingual dataset, or (b) the monolingual source text to be translated. Combining this idea with back-translation, we can generate huge amounts of synthetic bilingual in-domain data for both use cases. For our investigation, we use the state-of-the-art Transformer architecture. We employ mixed fine-tuning to train models that significantly improve translation of in-domain texts. More specifically, in both scenarios, our proposed methods achieve improvements of approximately 5-6 BLEU and 2-3 BLEU, respectively, on the Arabic-to-English and English-to-Arabic language pairs. Furthermore, the outcome of human evaluation corroborates the automatic evaluation results

    Sounds of Accompaniment

    Get PDF

    Sensate sovereignty : A dialogue on Dylan Robinson's hungry lIstening

    Full text link
    Dylan Robinson's Hungry Listening: Resonant Theory for Indigenous Sound Studies emerges from encounters between Indigenous sound performance and Western art music. The book takes aim at the pernicious tendency for the latter to insist upon aesthetic assimilation as the end-goal of these encounters, which far too often means derogating the former’s ontologies and protocols of song. In this dialogue-review, members from the The Culture and Technology Discussion and Working Group (The CATDAWG) situate the book within sound studies and critiques of settler colonial listening, reflecting on the major conceptual contributions of the book such as sensate sovereignty, hungry listening, and critical listening positionality

    Ecosystem Overfishing in the Ocean

    Get PDF
    Fisheries catches represent a net export of mass and energy that can no longer be used by trophic levels higher than those fished. Thus, exploitation implies a depletion of secondary production of higher trophic levels (here the production of mass and energy by herbivores and carnivores in the ecosystem) due to the removal of prey. The depletion of secondary production due to the export of biomass and energy through catches was recently formulated as a proxy for evaluating the ecosystem impacts of fishing–i.e., the level of ecosystem overfishing. Here we evaluate the historical and current risk of ecosystem overfishing at a global scale by quantifying the depletion of secondary production using the best available fisheries and ecological data (i.e., catch and primary production). Our results highlight an increasing trend in the number of unsustainable fisheries (i.e., an increase in the risk of ecosystem overfishing) from the 1950s to the 2000s, and illustrate the worldwide geographic expansion of overfishing. These results enable to assess when and where fishing became unsustainable at the ecosystem level. At present, total catch per capita from Large Marine Ecosystems is at least twice the value estimated to ensure fishing at moderate sustainable levels

    Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.

    Get PDF
    Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype

    Publisher Correction: Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.

    Get PDF
    A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper

    Process and politics in interactive musical works

    No full text
    Thesis: S.M. in Comparative Media Studies, Massachusetts Institute of Technology, Department of Humanities, 2016.Cataloged from PDF version of thesis.Includes bibliographical references (pages 83-85).As everyday musical experiences move further into software platforms, an interest among musicians in taking fuller advantage of computational media produces a strand of interactive, software-based musical works I call open mediational music. This phenomenon stands apart from other types of creative work centered on music and interaction by valorizing the listener's responsibility for instantiating musical works. It also advances an agenda of openness with respect to interactivity as a principle of new media. I center four case studies on a set of interactive musical works that exemplify this phenomenon: Reflective by Reiko Yamada, Thicket by Morgan Packard and Joshue Ott, Jazz. Computer by Yotam Mann and Sarah Rothberg, and Baggage Allowance by Pamela Z. Each of these works takes shape out of unique motivations and in different forms and settings. Collectively, they advance a notion of platforms as objects of critical awareness and propose listening as a model for mindful participation in algorithmic environments. Illuminating the distinct claims that sound and software hold on one another as creative domains, open mediational music invites listeners to rehearse a conscientious engagement with the sites and conditions of computationally mediated cultural encounter.by Andy Kelleher Stuhl.S.M. in Comparative Media Studie
    corecore