24 research outputs found
Adaptive Machine Translation with Large Language Models
Consistency is a key requirement of high-quality translation. It is
especially important to adhere to pre-approved terminology and adapt to
corrected translations in domain-specific projects. Machine translation (MT)
has achieved significant progress in the area of domain adaptation. However,
real-time adaptation remains challenging. Large-scale language models (LLMs)
have recently shown interesting capabilities of in-context learning, where they
learn to replicate certain input-output text generation patterns, without
further fine-tuning. By feeding an LLM at inference time with a prompt that
consists of a list of translation pairs, it can then simulate the domain and
style characteristics. This work aims to investigate how we can utilize
in-context learning to improve real-time adaptive MT. Our extensive experiments
show promising results at translation time. For example, GPT-3.5 can adapt to a
set of in-domain sentence pairs and/or terminology while translating a new
sentence. We observe that the translation quality with few-shot in-context
learning can surpass that of strong encoder-decoder MT systems, especially for
high-resource languages. Moreover, we investigate whether we can combine MT
from strong encoder-decoder models with fuzzy matches, which can further
improve translation quality, especially for less supported languages. We
conduct our experiments across five diverse language pairs, namely
English-to-Arabic (EN-AR), English-to-Chinese (EN-ZH), English-to-French
(EN-FR), English-to-Kinyarwanda (EN-RW), and English-to-Spanish (EN-ES)
Domain-specific text generation for machine translation
Preservation of domain knowledge from the source to target is crucial in any translation
workflow. It is common in the translation industry to receive highly specialized projects,
where there is hardly any parallel in-domain data. In such scenarios where there is insufficient
in-domain data to fine-tune Machine Translation (MT) models, producing translations that
are consistent with the relevant context is challenging. In this work, we propose a novel
approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs)
for domain-specific data augmentation for MT, simulating the domain characteristics of
either (a) a small bilingual dataset, or (b) the monolingual source text to be translated.
Combining this idea with back-translation, we can generate huge amounts of synthetic
bilingual in-domain data for both use cases. For our investigation, we use the state-of-the-art
Transformer architecture. We employ mixed fine-tuning to train models that significantly
improve translation of in-domain texts. More specifically, in both scenarios, our proposed
methods achieve improvements of approximately 5-6 BLEU and 2-3 BLEU, respectively, on
the Arabic-to-English and English-to-Arabic language pairs. Furthermore, the outcome of
human evaluation corroborates the automatic evaluation results
Sensate sovereignty : A dialogue on Dylan Robinson's hungry lIstening
Dylan Robinson's Hungry Listening: Resonant Theory for Indigenous Sound Studies emerges from encounters between Indigenous sound performance and Western art music. The book takes aim at the pernicious tendency for the latter to insist upon aesthetic assimilation as the end-goal of these encounters, which far too often means derogating the former’s ontologies and protocols of song. In this dialogue-review, members from the The Culture and Technology Discussion and Working Group (The CATDAWG) situate the book within sound studies and critiques of settler colonial listening, reflecting on the major conceptual contributions of the book such as sensate sovereignty, hungry listening, and critical listening positionality
Ecosystem Overfishing in the Ocean
Fisheries catches represent a net export of mass and energy that can no longer be used by trophic levels higher than those fished. Thus, exploitation implies a depletion of secondary production of higher trophic levels (here the production of mass and energy by herbivores and carnivores in the ecosystem) due to the removal of prey. The depletion of secondary production due to the export of biomass and energy through catches was recently formulated as a proxy for evaluating the ecosystem impacts of fishing–i.e., the level of ecosystem overfishing. Here we evaluate the historical and current risk of ecosystem overfishing at a global scale by quantifying the depletion of secondary production using the best available fisheries and ecological data (i.e., catch and primary production). Our results highlight an increasing trend in the number of unsustainable fisheries (i.e., an increase in the risk of ecosystem overfishing) from the 1950s to the 2000s, and illustrate the worldwide geographic expansion of overfishing. These results enable to assess when and where fishing became unsustainable at the ecosystem level. At present, total catch per capita from Large Marine Ecosystems is at least twice the value estimated to ensure fishing at moderate sustainable levels
Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.
Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype
Publisher Correction: Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.
A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper
Process and politics in interactive musical works
Thesis: S.M. in Comparative Media Studies, Massachusetts Institute of Technology, Department of Humanities, 2016.Cataloged from PDF version of thesis.Includes bibliographical references (pages 83-85).As everyday musical experiences move further into software platforms, an interest among musicians in taking fuller advantage of computational media produces a strand of interactive, software-based musical works I call open mediational music. This phenomenon stands apart from other types of creative work centered on music and interaction by valorizing the listener's responsibility for instantiating musical works. It also advances an agenda of openness with respect to interactivity as a principle of new media. I center four case studies on a set of interactive musical works that exemplify this phenomenon: Reflective by Reiko Yamada, Thicket by Morgan Packard and Joshue Ott, Jazz. Computer by Yotam Mann and Sarah Rothberg, and Baggage Allowance by Pamela Z. Each of these works takes shape out of unique motivations and in different forms and settings. Collectively, they advance a notion of platforms as objects of critical awareness and propose listening as a model for mindful participation in algorithmic environments. Illuminating the distinct claims that sound and software hold on one another as creative domains, open mediational music invites listeners to rehearse a conscientious engagement with the sites and conditions of computationally mediated cultural encounter.by Andy Kelleher Stuhl.S.M. in Comparative Media Studie