178 research outputs found

    BiPhone: Modeling Inter Language Phonetic Influences in Text

    Full text link
    A large number of people are forced to use the Web in a language they have low literacy in due to technology asymmetries. Written text in the second language (L2) from such users often contains a large number of errors that are influenced by their native language (L1). We propose a method to mine phoneme confusions (sounds in L2 that an L1 speaker is likely to conflate) for pairs of L1 and L2. These confusions are then plugged into a generative model (Bi-Phone) for synthetically producing corrupted L2 text. Through human evaluations, we show that Bi-Phone generates plausible corruptions that differ across L1s and also have widespread coverage on the Web. We also corrupt the popular language understanding benchmark SuperGLUE with our technique (FunGLUE for Phonetically Noised GLUE) and show that SoTA language understating models perform poorly. We also introduce a new phoneme prediction pre-training task which helps byte models to recover performance close to SuperGLUE. Finally, we also release the FunGLUE benchmark to promote further research in phonetically robust language models. To the best of our knowledge, FunGLUE is the first benchmark to introduce L1-L2 interactions in text.Comment: Accepted at ACL 202

    Sequential NMR assignments of labile protons in DNA using two-dimensional nuclear-Overhauser-enhancement spectroscopy with three jump-and-return pulse sequences

    Get PDF
    Two-dimensional nuclear Overhauser enhancement (NOESY) spectra of labile protons were recorded in H2O solutions of a protein and of a DNA duplex, using a modification of the standard NOESY experiment with all three 90 degree pulses replaced by jump-and-return sequences. For the protein as well as the DNA fragment the strategically important spectral regions could be recorded with good sensitivity and free of artifacts. Using this procedure, sequence-specific assignments were obtained for the imino protons, C2H of adenine, and C4NH2 of cytosine in a 23-base-pair DNA duplex which includes the 17-base-pair OR3 repressor binding site of bacteriophage lambda. Based on comparison with previously published results on the isolated OR3 binding site, these data were used for a study of chain termination effects on the chemical shifts of imino proton resonances of DNA duplexes

    Dating the Origin of Language Using Phonemic Diversity

    Get PDF
    Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data

    Context, cognition and communication in language

    Get PDF
    Questions pertaining to the unique structure and organisation of language have a long history in the field of linguistics. In recent years, researchers have explored cultural evolutionary explanations, showing how language structure emerges from weak biases amplified over repeated patterns of learning and use. One outstanding issue in these frameworks is accounting for the role of context. In particular, many linguistic phenomena are said to to be context-dependent; interpretation does not take place in a void, and requires enrichment from the current state of the conversation, the physical situation, and common knowledge about the world. Modelling the relationship between language structure and context is therefore crucial for developing a cultural evolutionary approach to language. One approach is to use statistical analyses to investigate large-scale, cross-cultural datasets. However, due to the inherent limitations of statistical analyses, especially with regards to the inadequacy of these methods to test hypotheses about causal relationships, I argue that experiments are better suited to address questions pertaining to language structure and context. From here, I present a series of artificial language experiments, with the central aim being to test how manipulations to context influence the structure and organisation of language. Experiment 1 builds upon previous work in iterated learning and communication games through demonstrating that the emergence of optimal communication systems is contingent on the contexts in which languages are learned and used. The results show that language systems gradually evolve to only encode information that is informative for conveying the intended meaning of the speaker - resulting in markedly different systems of communication. Whereas Experiment 1 focused on how context influences the emergence of structure, Experiments 2 and 3 investigate under what circumstances do manipulations to context result in the loss of structure. While the results are inconclusive across these two experiments, there is tentative evidence that manipulations to context can disrupt structure, but only when interacting with other factors. Lastly, Experiment 4 investigates whether the degree of signal autonomy (the capacity for a signal to be interpreted without recourse to contextual information) is shaped by manipulations to contextual predictability: the extent to which a speaker can estimate and exploit contextual information a hearer uses in interpreting an utterance. When the context is predictable, speakers organise languages to be less autonomous (more context-dependent) through combining linguistic signals with contextual information to reduce effort in production and minimise uncertainty in comprehension. By decreasing contextual predictability, speakers increasingly rely on strategies that promote more autonomous signals, as these signals depend less on contextual information to discriminate between possible meanings. Overall, these experiments provide proof-of-concept for investigating the relationship between language structure and context, showing that the organisational principles underpinning language are the result of competing pressures from context, cognition, and communication

    The synthesis of protected 5'-amino-2',5'-dideoxyribonucleoside-3'-O-phosphoramidites; applications of 5'-amino-oligodeoxyribonucleotides.

    No full text
    Synthetic routes to the four appropriately protected 5'-amino-2',5'-dideoxyribonucleoside-3'-O-(2-cyanoethyl N,N-diisopropylphosphoramidites) have been developed. The structures of all intermediates were confirmed by 13C n.m.r. spectroscopy. These building blocks have been used to prepare 5'-amino-oligodeoxyribonucleotides, which can be coupled to a wide variety of compounds, in particular metal cluster derivatives, but also fluorophores and biotin derivatives, thus generating a variety of very useful probes. Brief mention is made of a tetrairidium cluster derivative of 5'-amino-d[CCGATATCGG], which has been cocrystallised with EcoRV, and will be used for electron microscopy studies

    Chemical synthesis of a gene for somatomedin C.

    No full text
    A synthetic gene for somatomedin C, a human growth factor, has been assembled by a single ligation of 23 oligodeoxyribonucleotides, which were chemically synthesized by an improved solid phase phosphotriester method

    A new linkage for solid phase synthesis of oligodeoxyribonucleotides.

    No full text
    An aryl diisocyanate has been used to attach an appropriately protected 2'-deoxyribonucleoside bearing a free 3'-hydroxyl group, to a long chain alkylamine controlled pore glass support via a urethane moiety, in a simple two step procedure. This obviates the need for the preparation and short column chromatographic purification of the 2'-deoxyribonucleoside-3'-O-succinates required for preparation of the widely used succinyl linked supports. The greater stability of the urethane bond compared to an ester bond led to substantially higher yields of oligodeoxyribonucleotides prepared by the solid phase phosphotriester method. More than twenty oligodeoxyribonucleotides have already been synthesized on the glass support bearing the new linkage
    • …
    corecore