10 research outputs found

    Modeling Statistical Properties of Written Text

    Get PDF
    Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics

    Automatic summarization of voicemail messages using lexical and prosodic features

    Get PDF
    This article presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems as well as human transcriptions of voicemail speech

    A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

    No full text
    Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution—even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA

    ABRAXAS1 orchestrates BRCA1 activities to counter genome destabilizing repair pathways:lessons from breast cancer patients

    No full text
    Abstract It has been well-established that mutations in BRCA1 and BRCA2, compromising functions in DNA double-strand break repair (DSBR), confer hereditary breast and ovarian cancer risk. Importantly, mutations in these genes explain only a minor fraction of the hereditary risk and of the subset of DSBR deficient tumors. Our screening efforts identified two truncating germline mutations in the gene encoding the BRCA1 complex partner ABRAXAS1 in German early-onset breast cancer patients. To unravel the molecular mechanisms triggering carcinogenesis in these carriers of heterozygous mutations, we examined DSBR functions in patient-derived lymphoblastoid cells (LCLs) and in genetically manipulated mammary epithelial cells. By use of these strategies we were able to demonstrate that these truncating ABRAXAS1 mutations exerted dominant effects on BRCA1 functions. Interestingly, we did not observe haploinsufficiency regarding homologous recombination (HR) proficiency (reporter assay, RAD51-foci, PARP-inhibitor sensitivity) in mutation carriers. However, the balance was shifted to use of mutagenic DSBR-pathways. The dominant effect of truncated ABRAXAS1 devoid of the C-terminal BRCA1 binding site can be explained by retention of the N-terminal interaction sites for other BRCA1-A complex partners like RAP80. In this case BRCA1 was channeled from the BRCA1-A to the BRCA1-C complex, which induced single-strand annealing (SSA). Further truncation, additionally deleting the coiled-coil region of ABRAXAS1, unleashed excessive DNA damage responses (DDRs) de-repressing multiple DSBR-pathways including SSA and non-homologous end-joining (NHEJ). Our data reveal de-repression of low-fidelity repair activities as a common feature of cells from patients with heterozygous mutations in genes encoding BRCA1 and its complex partners

    FORMAL IS NATURAL: TOWARD AN ECOLOGICAL PHONOLOGY

    No full text
    Gibbon D. FORMAL IS NATURAL: TOWARD AN ECOLOGICAL PHONOLOGY. POZNAN STUDIES IN CONTEMPORARY LINGUISTICS. 2009;45(1):73-102.Naturalism Phonology (NP) has a history of opposition to abstractness, to generative linguistics, to formalist approaches, and differs from these in its strong focus on external rather than distributional, structural evidential domains. But evidence domains are orthogonal to empirical and formal methods, and, like formalist theories such as Optimality Theory (OT), the pedigree of NP includes structuralist and generative phonology. In an analysis which is sympathetic to both NP and OT, this contribution examines the relation between NP and OT, analyses a classic OT case study of syllabification in Tashlhiyt Berber, and presents computational linguistic analyses of this case, as well as of English syllable phonotactics and of tone language tonotactics. The contribution advocates an opening towards these methods, and the adoption of explicit, consistent, precise, complete and sound formal criteria for theories, which enable an exact interpretation in terms of operational models and computational implementations, and practical applications. The general frame of reference is a the Ecological Cycle in theory formation, from clarification of the domain through theory construction, interpretation with a model, evaluation and application in the original evidential domain, with payback to the language community from which the evidence was gained
    corecore