173,907 research outputs found

    Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using <it>k</it>-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: <it>gre.k </it>(generalized relative entropy) and <it>gsm.k </it>(gapped similarity measure).</p> <p>Results</p> <p>We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained.</p> <p>Conclusion</p> <p>Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel <it>gsm.k </it>introduced by this article, the <it>cos.k </it>followed. When the data becomes less redundant, <it>gre.k </it>proposed by us achieves a better performance, but all the other measures perform poorly on classification tasks. Almost all the statistical measures achieve improvement by exploring the information on 'sequence space' as word's length increases, especially for less redundant data. The reasonable results of phylogenetic analysis confirm that <it>Gdis.k </it>based on 'sequence space' is a reliable measure for phylogenetic analysis. In summary, our quantitative analysis verifies that exploring the information on 'sequence space' is a promising way to improve the abilities of statistical measures for protein comparison.</p

    Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

    Full text link
    Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

    Children retain implicitly learned phonological sequences better than adults: A longitudinal study

    Get PDF
    This is the peer reviewed version of the following article: Eleonore H. M. Smalle, Mike P. A. Page, Wouter Duyck, Martin Edwards, and Arnaud Szmalec, 'Children retain implicitly learned phonological sequences better than adults: a longitudinal study', Developmental Science, December 2017, which has been published in final form at DOI: 10.1111/desc.12634. Under embargo until 17 December 2018. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Whereas adults often rely on explicit memory, children appear to excel in implicit memory, which plays an important role in the acquisition of various cognitive skills, such as those involved in language. The current study aimed to test the assertion of an age-dependent shift in implicit versus explicit learning within a theoretical framework that explains the link between implicit sequence memory and word-form acquisition, using the Hebb repetition paradigm. We conducted a one-year, multiple-session longitudinal study in which we presented auditory sequences of syllables, co-presented with pictures of aliens, for immediate serial recall by a group of children (8–9 years) and by an adult group. The repetition of one Hebb sequence was explicitly announced, while the repetition of another Hebb sequence was unannounced and, therefore, implicit. Despite their overall inferior recall performance, the children showed better offline retention of the implicit Hebb sequence, compared with adults who showed a significant decrement across the delays. Adults had gained more explicit knowledge of the implicit sequence than children, but this could not explain the age-dependent decline in the delayed memory for it. There was no significant age-effect for delayed memory of the explicit Hebb sequence, with both age groups showing retention. Overall performance by adults was positively correlated with measures of post-learning awareness. Performance by children was positively correlated with vocabulary knowledge. We conclude that children outperform adults in the retention over time of implicitly learned phonological sequences that will gradually consolidate into novel word-forms. The findings are discussed in the light of maturational differences for implicit versus explicit memory systems that also play a role in language acquisition. A video abstract of this article can be viewed at: https://youtu.be/G5nOfJB72t4.Peer reviewedFinal Accepted Versio
    • …
    corecore