8 research outputs found
Leveraging Language to Learn Program Abstractions and Search Heuristics
Inductive program synthesis, or inferring programs from examples of desired
behavior, offers a general paradigm for building interpretable, robust, and
generalizable machine learning systems. Effective program synthesis depends on
two key ingredients: a strong library of functions from which to build
programs, and an efficient search strategy for finding programs that solve a
given task. We introduce LAPS (Language for Abstraction and Program Search), a
technique for using natural language annotations to guide joint learning of
libraries and neurally-guided search models for synthesis. When integrated into
a state-of-the-art library learning system (DreamCoder), LAPS produces
higher-quality libraries and improves search efficiency and generalization on
three domains -- string editing, image composition, and abstract reasoning
about scenes -- even when no natural language hints are available at test time.Comment: appeared in Thirty-eighth International Conference on Machine
Learning (ICML 2021
Enriching the Swedish Sign Language Corpus with Part of Speech Tags Using Joint Bayesian Word Alignment and Annotation Transfer
Abstract We have used a novel Bayesian model of joint word alignment and part of speech (PoS) annotation transfer to enrich the Swedish Sign Language Corpus with PoS tags. The annotations were then handcorrected in order to both improve annotation quality for the corpus, and allow the empirical evaluation presented herein
Enriching the Swedish Sign Language Corpus with Part of Speech Tags Using Joint Bayesian Word Alignment and Annotation Transfer
Abstract We have used a novel Bayesian model of joint word alignment and part of speech (PoS) annotation transfer to enrich the Swedish Sign Language Corpus with PoS tags. The annotations were then handcorrected in order to both improve annotation quality for the corpus, and allow the empirical evaluation presented herein
A Bayesian model for joint word alignment and part-of-speech transfer
Current methods for word alignment require considerable amounts of parallel text to deliver accurate results, a requirement which is met only for a small minority of the world's approximately 7,000 languages. We show that by jointly performing word alignment and annotation transfer in a novel Bayesian model, alignment accuracy can be improved for language pairs where annotations are available for only one of the languages---a finding which could facilitate the study and processing of a vast number of low-resource languages. We also present an evaluation where our method is used to perform single-source and multi-source part-of-speech transfer with 22 translations of the same text in four different languages. This allows us to quantify the considerable variation in accuracy depending on the specific source text(s) used, even with different translations into the same language.Non peer reviewe
A Systematic Bayesian Treatment of the IBM Alignment Models
The dominant yet ageing IBM and HMM word alignment models underpin most popular Statistical Machine Translation implementations in use today. Though beset by the limitations of implausible independence assumptions, intractable optimisation problems, and an excess of tunable parameters, these models provide a scalable and reliable starting point for inducing translation systems. In this paper we build upon this venerable base by recasting these models in the non-parametric Bayesian framework. By replacing the categorical distributions at their core with hierarchical Pitman-Yor processes, and through the use of collapsed Gibbs sampling, we provide a more flexible formulation and sidestep the original heuristic optimisation techniques. The resulting models are highly extendible, naturally permitting the introduction of phrasal dependencies. We present extensive experimental results showing improvements in both AER and BLEU when benchmarked against Giza++, including significant improvements over IBM model 4.