Search CORE

173,907 research outputs found

Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

Author: A Andreeva
A Bairoch
A Bateman
A Kelil
AP Bradley
B Rost
B Thiruv
BE Blaisdell
CH Wu
D Barthel
EM Taylor
F Pearl
F Ronquist
G Didier
G Fichant
G Reinert
GW Stuart
J Felsenstein
J Felsenstein
J Felsenstein
J Lowe
J Soppa
JM Word
JP Egan
JP Huelsenbeck
K Komatsu
KP Wu
LP Chew
M Hirano
M Sierk
N Cobbe
N Krasnogor
N Saitoh
P Ferragina
Qi Dai
S Hochreiter
S Kumar
S Vinga
S Vinga
SF Altschul
SF Altschul
TD Pham
TD Pham
Tianming Wang
TJ Wu
TJ Wu
W Li
Y Fujioka
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using <it>k</it>-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: <it>gre.k </it>(generalized relative entropy) and <it>gsm.k </it>(gapped similarity measure). Results We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained. Conclusion Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel <it>gsm.k </it>introduced by this article, the <it>cos.k </it>followed. When the data becomes less redundant, <it>gre.k </it>proposed by us achieves a better performance, but all the other measures perform poorly on classification tasks. Almost all the statistical measures achieve improvement by exploring the information on 'sequence space' as word's length increases, especially for less redundant data. The reasonable results of phylogenetic analysis confirm that <it>Gdis.k </it>based on 'sequence space' is a reliable measure for phylogenetic analysis. In summary, our quantitative analysis verifies that exploring the information on 'sequence space' is a promising way to improve the abilities of statistical measures for protein comparison.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

Author: Ando Rie Kubota
Lee Lillian
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 10/05/2002
Field of study

Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Children retain implicitly learned phonological sequences better than adults: A longitudinal study

Author: Adi-Japha
Amso
Archibald
Ashtamker
Bishop
Bogaerts
Bogaerts
Brown
Cleeremans
Couture
Craik
Csabi
Dunn
Duyck
Evans
Ferman
Finn
Foerde
Gagnon
Gagnon
Gathercole
Graf Estes
Guerard
Gupta
Gupta
Gupta
Hebb
Hsu
Janacsek
Johnson
Kalm
Leach
Lichtman
Majerus
Majerus
McKelvie
Meulemans
Mosse
Murphy
Nemeth
Nemeth
Newport
Page
Page
Page
Pan
Perez
Poldrack
Poldrack
Rickard
Robertson
Saffran
Saffran
Sandberg
Schwartz
Smalle
Smalle
Squire
Szmalec
Szmalec
Szmalec
Thiessen
Thomas
Turcotte
Ullman
Ullman
Weber-Fox
Wechsler
Wilhelm
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

This is the peer reviewed version of the following article: Eleonore H. M. Smalle, Mike P. A. Page, Wouter Duyck, Martin Edwards, and Arnaud Szmalec, 'Children retain implicitly learned phonological sequences better than adults: a longitudinal study', Developmental Science, December 2017, which has been published in final form at DOI: 10.1111/desc.12634. Under embargo until 17 December 2018. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Whereas adults often rely on explicit memory, children appear to excel in implicit memory, which plays an important role in the acquisition of various cognitive skills, such as those involved in language. The current study aimed to test the assertion of an age-dependent shift in implicit versus explicit learning within a theoretical framework that explains the link between implicit sequence memory and word-form acquisition, using the Hebb repetition paradigm. We conducted a one-year, multiple-session longitudinal study in which we presented auditory sequences of syllables, co-presented with pictures of aliens, for immediate serial recall by a group of children (8–9 years) and by an adult group. The repetition of one Hebb sequence was explicitly announced, while the repetition of another Hebb sequence was unannounced and, therefore, implicit. Despite their overall inferior recall performance, the children showed better offline retention of the implicit Hebb sequence, compared with adults who showed a significant decrement across the delays. Adults had gained more explicit knowledge of the implicit sequence than children, but this could not explain the age-dependent decline in the delayed memory for it. There was no significant age-effect for delayed memory of the explicit Hebb sequence, with both age groups showing retention. Overall performance by adults was positively correlated with measures of post-learning awareness. Performance by children was positively correlated with vocabulary knowledge. We conclude that children outperform adults in the retention over time of implicitly learned phonological sequences that will gradually consolidate into novel word-forms. The findings are discussed in the light of maturational differences for implicit versus explicit memory systems that also play a role in language acquisition. A video abstract of this article can be viewed at: https://youtu.be/G5nOfJB72t4.Peer reviewedFinal Accepted Versio

Crossref

Ghent University Academic Bibliography

DIAL UCLouvain

University of Hertfordshire Research Archive