Search CORE

178 research outputs found

BiPhone: Modeling Inter Language Phonetic Influences in Text

Author: Gupta Abhirut
Jash Ambarish
Raghuveer Aravindan
Ren James S.
Sai Ananya B.
Sodhi Sukhdeep S.
Sproat Richard
Vasilevski Yuri
Publication venue
Publication date: 06/07/2023
Field of study

A large number of people are forced to use the Web in a language they have low literacy in due to technology asymmetries. Written text in the second language (L2) from such users often contains a large number of errors that are influenced by their native language (L1). We propose a method to mine phoneme confusions (sounds in L2 that an L1 speaker is likely to conflate) for pairs of L1 and L2. These confusions are then plugged into a generative model (Bi-Phone) for synthetically producing corrupted L2 text. Through human evaluations, we show that Bi-Phone generates plausible corruptions that differ across L1s and also have widespread coverage on the Web. We also corrupt the popular language understanding benchmark SuperGLUE with our technique (FunGLUE for Phonetically Noised GLUE) and show that SoTA language understating models perform poorly. We also introduce a new phoneme prediction pre-training task which helps byte models to recover performance close to SuperGLUE. Finally, we also release the FunGLUE benchmark to promote further research in phonetically robust language models. To the best of our knowledge, FunGLUE is the first benchmark to introduce L1-L2 interactions in text.Comment: Accepted at ACL 202

arXiv.org e-Print Archive

Sequential NMR assignments of labile protons in DNA using two-dimensional nuclear-Overhauser-enhancement spectroscopy with three jump-and-return pulse sequences

Author: Gait M. J.
Ganesh K. N.
Gruetter R.
Leupin W.
Minganti C.
Otting G.
Sproat B. S.
Wüthrich K.
Publication venue
Publication date: 28/05/2012
Field of study

Two-dimensional nuclear Overhauser enhancement (NOESY) spectra of labile protons were recorded in H2O solutions of a protein and of a DNA duplex, using a modification of the standard NOESY experiment with all three 90 degree pulses replaced by jump-and-return sequences. For the protein as well as the DNA fragment the strategically important spectral regions could be recorded with good sensitivity and free of artifacts. Using this procedure, sequence-specific assignments were obtained for the imino protons, C2H of adenine, and C4NH2 of cytosine in a 23-base-pair DNA duplex which includes the 17-base-pair OR3 repressor binding site of bacteriophage lambda. Based on comparison with previously published results on the isolated OR3 binding site, these data were used for a study of chain termination effects on the chemical shifts of imino proton resonances of DNA duplexes

Infoscience - École polytechnique fédérale de Lausanne

Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Context, cognition and communication in language

Author: A Caspi
A Wray
B Yanny
BM Way
C Beckner
C Ember
CL Nunn
CM Bennett
D Dediu
D Dediu
D Fanelli
D Nettle
D Nettle
D Nettle
DK Simonton
DT Eisenberg
E Paradis
E Paradis
FH Messerli
Frank Emmert-Streib
G Desideri
G Lupyan
G Murdock
H Clahsen
J Backhaus
J Hay
James Winters
JE Terrell
JG Fought
K Claesson
K Rogoff
KS Button
M Collard
M Donohue
M Dunn
M Kalisch
M Pagel
MD Pagel
MH Maathuis
MH Ross
MK Chen
MT Todd
P Trudgill
QD Atkinson
R Bouckaert
R Mace
R Naroll
R Sproat
RD Gray
RD Gray
S Levinson
S Moran
S Roberts
S Wichmann
SA Fritz
Seán Roberts
T Mitchell
TF Jaeger
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Questions pertaining to the unique structure and organisation of language have a long history in the field of linguistics. In recent years, researchers have explored cultural evolutionary explanations, showing how language structure emerges from weak biases amplified over repeated patterns of learning and use. One outstanding issue in these frameworks is accounting for the role of context. In particular, many linguistic phenomena are said to to be context-dependent; interpretation does not take place in a void, and requires enrichment from the current state of the conversation, the physical situation, and common knowledge about the world. Modelling the relationship between language structure and context is therefore crucial for developing a cultural evolutionary approach to language. One approach is to use statistical analyses to investigate large-scale, cross-cultural datasets. However, due to the inherent limitations of statistical analyses, especially with regards to the inadequacy of these methods to test hypotheses about causal relationships, I argue that experiments are better suited to address questions pertaining to language structure and context. From here, I present a series of artificial language experiments, with the central aim being to test how manipulations to context influence the structure and organisation of language. Experiment 1 builds upon previous work in iterated learning and communication games through demonstrating that the emergence of optimal communication systems is contingent on the contexts in which languages are learned and used. The results show that language systems gradually evolve to only encode information that is informative for conveying the intended meaning of the speaker - resulting in markedly different systems of communication. Whereas Experiment 1 focused on how context influences the emergence of structure, Experiments 2 and 3 investigate under what circumstances do manipulations to context result in the loss of structure. While the results are inconclusive across these two experiments, there is tentative evidence that manipulations to context can disrupt structure, but only when interacting with other factors. Lastly, Experiment 4 investigates whether the degree of signal autonomy (the capacity for a signal to be interpreted without recourse to contextual information) is shaped by manipulations to contextual predictability: the extent to which a speaker can estimate and exploit contextual information a hearer uses in interpreting an utterance. When the context is predictable, speakers organise languages to be less autonomous (more context-dependent) through combining linguistic signals with contextual information to reduce effort in production and minimise uncertainty in comprehension. By decreasing contextual predictability, speakers increasingly rely on strategies that promote more autonomous signals, as these signals depend less on contextual information to discriminate between possible meanings. Overall, these experiments provide proof-of-concept for investigating the relationship between language structure and context, showing that the organisational principles underpinning language are the result of competing pressures from context, cognition, and communication

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Archive

MPG.PuRe

Explore Bristol Research

AP Adjacency as a Precedence Constraint

Author: Abeillé Anne
Ackema Peter
Ad Neeleman
Andrews Avery
Brody Michael
Bury Dirk
Büring Daniel
Chametzky Robert
Chomsky Noam
Chomsky Noam
Chomsky Noam
Chomsky Noam
Chomsky Noam
Cinque Guglielmo
Collins Chris
Corver Norbert
Culicover Peter
Dryer Matthew S
Emonds Joseph
Fanselow Gisbert
Greenberg Joseph H
Guilfoyle Eithne
Guimarães Maximiliano
Haider Hubert
Higginbotham James
Kayne Richard S
Kenesei István
Larson Richard
Larson Richard K
Larson Richard K
Lees R. B
Longobardi Giuseppe
Partee Barbara
Pereltsvaig Asya
Pollock Jean-Yves
Postal Paul
Postal Paul
Riemsdijk Henk van
Roberts Ian
Rochemont Michael
Rothstein Susan
Rouveret Alain
Sproat Richard
Sternefeld Wolfgang
Surányi Balázs
Svenonius Peter
Szabolcsi Anna
Vanden Wyngaerd Guido
Whitman John
Williams Edwin
Williams Edwin
Wunderlich Dieter
Zamparelli Roberto
Zoë Belk
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

The synthesis of protected 5'-amino-2',5'-dideoxyribonucleoside-3'-O-phosphoramidites; applications of 5'-amino-oligodeoxyribonucleotides.

Author: Beijer B
Rider P
Sproat B S
Publication venue
Publication date: 01/01/1987
Field of study

Synthetic routes to the four appropriately protected 5'-amino-2',5'-dideoxyribonucleoside-3'-O-(2-cyanoethyl N,N-diisopropylphosphoramidites) have been developed. The structures of all intermediates were confirmed by 13C n.m.r. spectroscopy. These building blocks have been used to prepare 5'-amino-oligodeoxyribonucleotides, which can be coupled to a wide variety of compounds, in particular metal cluster derivatives, but also fluorophores and biotin derivatives, thus generating a variety of very useful probes. Brief mention is made of a tetrairidium cluster derivative of 5'-amino-d[CCGATATCGG], which has been cocrystallised with EcoRV, and will be used for electron microscopy studies

Crossref

PubMed Central

Chemical synthesis of a gene for somatomedin C.

Author: Gait M J
Sproat B S
Publication venue
Publication date: 01/01/1985
Field of study

A synthetic gene for somatomedin C, a human growth factor, has been assembled by a single ligation of 23 oligodeoxyribonucleotides, which were chemically synthesized by an improved solid phase phosphotriester method

Crossref

PubMed Central

A new linkage for solid phase synthesis of oligodeoxyribonucleotides.

Author: Brown D M
Sproat B S
Publication venue
Publication date: 01/01/1985
Field of study

An aryl diisocyanate has been used to attach an appropriately protected 2'-deoxyribonucleoside bearing a free 3'-hydroxyl group, to a long chain alkylamine controlled pore glass support via a urethane moiety, in a simple two step procedure. This obviates the need for the preparation and short column chromatographic purification of the 2'-deoxyribonucleoside-3'-O-succinates required for preparation of the widely used succinyl linked supports. The greater stability of the urethane bond compared to an ester bond led to substantially higher yields of oligodeoxyribonucleotides prepared by the solid phase phosphotriester method. More than twenty oligodeoxyribonucleotides have already been synthesized on the glass support bearing the new linkage

Crossref

PubMed Central