178 research outputs found
BiPhone: Modeling Inter Language Phonetic Influences in Text
A large number of people are forced to use the Web in a language they have
low literacy in due to technology asymmetries. Written text in the second
language (L2) from such users often contains a large number of errors that are
influenced by their native language (L1). We propose a method to mine phoneme
confusions (sounds in L2 that an L1 speaker is likely to conflate) for pairs of
L1 and L2. These confusions are then plugged into a generative model (Bi-Phone)
for synthetically producing corrupted L2 text. Through human evaluations, we
show that Bi-Phone generates plausible corruptions that differ across L1s and
also have widespread coverage on the Web. We also corrupt the popular language
understanding benchmark SuperGLUE with our technique (FunGLUE for Phonetically
Noised GLUE) and show that SoTA language understating models perform poorly. We
also introduce a new phoneme prediction pre-training task which helps byte
models to recover performance close to SuperGLUE. Finally, we also release the
FunGLUE benchmark to promote further research in phonetically robust language
models. To the best of our knowledge, FunGLUE is the first benchmark to
introduce L1-L2 interactions in text.Comment: Accepted at ACL 202
Sequential NMR assignments of labile protons in DNA using two-dimensional nuclear-Overhauser-enhancement spectroscopy with three jump-and-return pulse sequences
Two-dimensional nuclear Overhauser enhancement (NOESY) spectra of labile protons were recorded in H2O solutions of a protein and of a DNA duplex, using a modification of the standard NOESY experiment with all three 90 degree pulses replaced by jump-and-return sequences. For the protein as well as the DNA fragment the strategically important spectral regions could be recorded with good sensitivity and free of artifacts. Using this procedure, sequence-specific assignments were obtained for the imino protons, C2H of adenine, and C4NH2 of cytosine in a 23-base-pair DNA duplex which includes the 17-base-pair OR3 repressor binding site of bacteriophage lambda. Based on comparison with previously published results on the isolated OR3 binding site, these data were used for a study of chain termination effects on the chemical shifts of imino proton resonances of DNA duplexes
Dating the Origin of Language Using Phonemic Diversity
Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data
Context, cognition and communication in language
Questions pertaining to the unique structure and organisation of language have a
long history in the field of linguistics. In recent years, researchers have explored
cultural evolutionary explanations, showing how language structure emerges from
weak biases amplified over repeated patterns of learning and use. One outstanding
issue in these frameworks is accounting for the role of context. In particular,
many linguistic phenomena are said to to be context-dependent; interpretation
does not take place in a void, and requires enrichment from the current state
of the conversation, the physical situation, and common knowledge about the
world. Modelling the relationship between language structure and context is
therefore crucial for developing a cultural evolutionary approach to language.
One approach is to use statistical analyses to investigate large-scale, cross-cultural
datasets. However, due to the inherent limitations of statistical analyses, especially
with regards to the inadequacy of these methods to test hypotheses about
causal relationships, I argue that experiments are better suited to address questions
pertaining to language structure and context. From here, I present a series
of artificial language experiments, with the central aim being to test how
manipulations to context influence the structure and organisation of language.
Experiment 1 builds upon previous work in iterated learning and communication
games through demonstrating that the emergence of optimal communication systems
is contingent on the contexts in which languages are learned and used. The
results show that language systems gradually evolve to only encode information
that is informative for conveying the intended meaning of the speaker - resulting
in markedly different systems of communication. Whereas Experiment 1 focused
on how context influences the emergence of structure, Experiments 2 and 3 investigate
under what circumstances do manipulations to context result in the loss
of structure. While the results are inconclusive across these two experiments,
there is tentative evidence that manipulations to context can disrupt structure,
but only when interacting with other factors. Lastly, Experiment 4 investigates
whether the degree of signal autonomy (the capacity for a signal to be interpreted without recourse to contextual information) is shaped by manipulations
to contextual predictability: the extent to which a speaker can estimate and exploit
contextual information a hearer uses in interpreting an utterance. When
the context is predictable, speakers organise languages to be less autonomous
(more context-dependent) through combining linguistic signals with contextual
information to reduce effort in production and minimise uncertainty in comprehension.
By decreasing contextual predictability, speakers increasingly rely on
strategies that promote more autonomous signals, as these signals depend less on
contextual information to discriminate between possible meanings. Overall, these
experiments provide proof-of-concept for investigating the relationship between
language structure and context, showing that the organisational principles underpinning
language are the result of competing pressures from context, cognition,
and communication
The synthesis of protected 5'-amino-2',5'-dideoxyribonucleoside-3'-O-phosphoramidites; applications of 5'-amino-oligodeoxyribonucleotides.
Synthetic routes to the four appropriately protected 5'-amino-2',5'-dideoxyribonucleoside-3'-O-(2-cyanoethyl N,N-diisopropylphosphoramidites) have been developed. The structures of all intermediates were confirmed by 13C n.m.r. spectroscopy. These building blocks have been used to prepare 5'-amino-oligodeoxyribonucleotides, which can be coupled to a wide variety of compounds, in particular metal cluster derivatives, but also fluorophores and biotin derivatives, thus generating a variety of very useful probes. Brief mention is made of a tetrairidium cluster derivative of 5'-amino-d[CCGATATCGG], which has been cocrystallised with EcoRV, and will be used for electron microscopy studies
Chemical synthesis of a gene for somatomedin C.
A synthetic gene for somatomedin C, a human growth factor, has been assembled by a single ligation of 23 oligodeoxyribonucleotides, which were chemically synthesized by an improved solid phase phosphotriester method
A new linkage for solid phase synthesis of oligodeoxyribonucleotides.
An aryl diisocyanate has been used to attach an appropriately protected 2'-deoxyribonucleoside bearing a free 3'-hydroxyl group, to a long chain alkylamine controlled pore glass support via a urethane moiety, in a simple two step procedure. This obviates the need for the preparation and short column chromatographic purification of the 2'-deoxyribonucleoside-3'-O-succinates required for preparation of the widely used succinyl linked supports. The greater stability of the urethane bond compared to an ester bond led to substantially higher yields of oligodeoxyribonucleotides prepared by the solid phase phosphotriester method. More than twenty oligodeoxyribonucleotides have already been synthesized on the glass support bearing the new linkage
- …