8 research outputs found

    A Note on Zipf's Law, Natural Languages, and Noncoding DNA regions

    Get PDF
    In Phys. Rev. Letters (73:2, 5 Dec. 94), Mantegna et al. conclude on the basis of Zipf rank frequency data that noncoding DNA sequence regions are more like natural languages than coding regions. We argue on the contrary that an empirical fit to Zipf's ``law'' cannot be used as a criterion for similarity to natural languages. Although DNA is a presumably an ``organized system of signs'' in Mandelbrot's (1961) sense, an observation of statistical features of the sort presented in the Mantegna et al. paper does not shed light on the similarity between DNA's ``grammar'' and natural language grammars, just as the observation of exact Zipf-like behavior cannot distinguish between the underlying processes of tossing an MM sided die or a finite-state branching process.Comment: compressed uuencoded postscript file: 14 page

    Extracellular Vesicles: Living Prototypal Communication System

    Get PDF
    Communication is an ever-present part of our world. Such transfer of information occurs on many levels from the spoken natural languages, to artificial languages, to the cellular exchanges that govern the molecular world. Cells interact using various coded and non-coded molecules, which although not natural languages, could be considered types of biological language. These molecules are packaged into extracellular vesicles by cells from all three domains of life. Vesicles may then participate in intracellular trafficking of their cargo molecules. Or cells may secrete vesicles into the extracellular world, from where they are transported to, and taken up by, target recipient cells. Once delivered, extracellular vesicles exert a plethora of physiological and pathological effects, as well as an influence on recipient cell evolution. In executing their functions, both vesicles and their molecular cargo face evolutionary pressures over time and across habitats, forcing them to adapt to meet changing needs. This chapter will present extracellular vesicles as a highly conserved prototypal communication system

    Parallels of human language in the behavior of bottlenose dolphins

    Full text link
    A short review of similarities between dolphins and humans with the help of quantitative linguistics and information theory

    Análise de distribuições de distâncias entre palavras genómicas

    Get PDF
    The investigation of DNA has been one of the most developed areas of research in this and in the last century. However, there is a long way to go to fully understand the DNA code. With the increasing of DNA sequenced data, mathematical methods play an important role in addressing the need for e cient quantitative techniques for the detection of regions of interest and overall characteristics in these sequences. A feature of interest in the study of genomic words is their spatial distribution along a DNA sequence, which can be characterized by the distances between words. Counting such distances provides discrete distributions that may be analyzed from a statistical point of view. In this work we explore the distances between genomic words as a mathematical descriptor of DNA sequences. The main goal is to design, develop and apply statistical methods specially designed for their distributions, in order to capture information about the primary and secondary structure of DNA. The characterization of empirical inter-word distance distributions involves the problem of the exponential increasing of the number of distributions as the word length increases, leading to the need of data reduction. Moreover, if the data can be validly clustered, the class labels may provide a meaningful description of similarities and di erences between sets of distributions. Therefore, we explore the inter-word distance distributions potential to obtain a word clustering, able to highlight similar patterns of word distributions as well as summarized characteristics of each set of distributions. With the aim of performing comparative studies between genomic sequences and de ning species signatures, we deduce exact distributions of inter-word distances under random scenarios. Based on these theoretical distributions, we de ne genomic signatures of species able to discriminate between species and to capture their evolutionary relation. We presume that the study of distributions similarities and the clustering procedure allow identifying words whose distance distribution strongly di ers from a reference distribution or from the global behaviour of the majority of the words. One of the key topics of our research focuses on the establishment of procedures that capture distance distributions with atypical behaviours, herein referred to as atypical distributions. In the genomic context, words with an atypical distance distribution may be related with some biological function (motifs). We expect that our results may be used to provide some sort of classi cation of sequences, identifying evolutionary patterns and allowing for the prediction of functional properties, thereby contributing to the advancement of knowledge about DNA sequences.A investigação do ADN é uma das áreas mais desenvolvidas neste e no último século. O crescente aumento do número de genomas sequenciados tem exigido técnicas quantitativas mais e cientes para a identi cação de características gerais e especí cas das sequências genómicas, os métodos matemáticos desempenham um papel importante na resposta a essa necessidade. Uma característica com particular interesse no estudo de palavras genómicas é a sua distribuição espacial ao longo de sequências de ADN, podendo esta ser caracterizada pelas distâncias entre palavras. A contagem dessas distâncias fornece distribuições discretas passíveis de análise estatística. Neste trabalho, exploramos as distâncias entre palavras como um descritor matemático das sequências de ADN, tendo como objetivo delinear e desenvolver procedimentos estatísticos especialmente concebidos para o estudo das suas distribuições. A caracterização das distribuições de distâncias empíricas entre palavras genómicas envolve o problema do crescimento exponencial do número de distribuições com o aumento do comprimento da palavra, gerando a necessidade de redução dos dados. Além disso, se os dados puderem ser validamente agrupados em classes então os representantes de classe fornecem informação relevante sobre semelhanças e diferenças entre cada grupo de distribuições. Assim, exploramos o potencial das distribuições de distâncias na obtenção de um agrupamento de palavras, que agrupe padrões de distâncias semelhantes e que coloque em evidência as características de cada grupo. Com vista ao estudo comparativo de sequências genómicas e à de nição de assinaturas de espécies, focamo-nos no desenvolvimento de modelos teóricos que descrevam distribuições de distâncias entre palavras em cenários aleatórios. Esses modelos são utilizados na de nição de assinaturas genómicas, capazes de discriminar entre espécies e de recuperar relações evolutivas entre estas. Presumimos que o estudo de semelhanças e a análise de agrupamento das distribuições permite identi car palavras cuja distribuição se afasta fortemente de uma distribuição de referência ou do comportamento global das maioria das palavras. Um dos principais tópicos de investigação foca-se na deteção de distribuições com comportamentos anormais, aqui referidas como distribuições atípicas. No contexto genómico, palavras com distribuições de distâncias atípicas poderão estar relacionadas com alguma função biológica (motivos). Esperamos que os resultados obtidos possam ser utilizados para fornecer algum tipo de classi cação de sequências, identi cando padrões evolutivos e permitindo a previsão das propriedades funcionais, representando assim um passo adicional na criação de conhecimento sobre sequências de ADN.Programa Doutoral em Matemátic

    Ambiguity and entropy in the process of translation and post-editing

    Get PDF
    This thesis analyses the way in which ambiguity is cognitively processed, in translation in general and post-editing in particular, drawing inferences from psycholinguistics, bilingualism, and entropy-based models of translation cognition. Conceptually, it assumes non-selective activation of both languages (source and target) in the translation process, and explores how entropy and entropy reduction can theoretically describe assumed mental states during disambiguation. Empirically, it uses a product-based metric of word translation entropy (HTra), and eye-movement and keystroke data from the CRITT Translation Process Research Database, to shed light on how the conceptual understanding of lexical and structural ambiguity may be manifested by observable behaviour. At the lexical level, examination of behavioural data pertaining to a high-HTra item from 217 participants translating/post-editing from English into multiple languages shows that the item tends to result in pauses in production and regression of eye movements, and that the translators’/post-editors’ corresponding scrutinization of the source text (ST) tends to involve a visual search for lower-HTra words in the co-text and, accordingly, a decrease in the average entropy of the activity unit. Regarding syntax, a Chinese relative clause in the machine translation output, which can involve a garden-path effect, is examined in terms of eye movements from 18 participants. Results show that, contrary to monolingual reading, disruptions of processing tend to occur not in the later part of the sentence where the wrong parse is disconfirmed, but in the earlier regions where the most quickly-built analysis is semantically inconsistent with the ST. Structural disambiguation and re-analysis seem to be bypassed. This suggests that, on the one hand, reading for post-editing receives a strong biasing effect from the ST, and on the other, argument integration is more appropriately explained from an incremental processing perspective rather than a head-driven approach, as thematic roles seem to be assigned immediately in reading for post-editing. While the lexical analysis supports a parallel disambiguation model, the structural analysis seems to support a serial one. In terms of translation models, both emphasize the impact of cross-linguistic priming and the presence of considerable horizontality in the translation process

    The salience of the hues: colour cognition from an indigenous Australian perspective

    Get PDF
    Does natural language determine the way we think? If the so-called Sapir-Whorf hypothesis of linguistics were true then colour categorization would be an entirely arbitrary process dependant entirely on the language that we speak. For a while in academic circles this was the received wisdom: The fact that English had 11 colour terms and Dugum Dani (from Papua New Guinea) had only two was a factor attributed to the language in use. In 1969, Brent Berlin and Paul Kay ushered in the paradigm shift of Basic Colour Terms, a theory that defined the structure and evolution of colour terms in cultures (Berlin and Kay, 1969). According to this theory there was a neurophysiological basis for colour categorization, which implied that all human beings had the potential to see the same colours but naming the categories was an evolutionary process tied into the technological sophistication of a society. Language in other words played little or no part in colour perception. The initial theory was highly controversial with the demand growing for verification of the initial findings. To address the contentious issues surrounding Berlin and Kay’s theory, a project entitled the World Colour Survey was initiated with the goal of determining colour categorization patterns within 110 pre-literate cultures across the globe. This project has spanned more than 30 years with a definitive publication of the results still in the works. The current PhD project, which has been in progress since the dawn of time, involves an independent analysis and interpretation of the indigenous Australian component of the World Colour Survey raw data on colour categorization. It is both an exercise in secondary analysis as a research method and a tangential meditation on the ambiguity of knowledge that can be derived from extant data sets

    A Note on Zipf's Law, Natural Languages, and Noncoding DNA Regions.

    No full text
    corecore