Search CORE

609 research outputs found

A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

Author: Laurent Noé
Donald E.K. Martin
Apostolico A.
Bassino F.
Boden M.
Břinda K.
Burkhardt S.
Egidi L.
Gambin A.
Leslie C.S.
Martin D.E.K.
Martin D.E.K.
Régnier M.
Simon I.
Zhou L.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2010
Field of study

Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

arXiv.org e-Print Archive

HAL - Lille 3

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Copenhagen University Research Information System

A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

Author: Martin Donald E. K.
Noé Laurent
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2014
Field of study

arXiv.org e-Print Archive

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

PubMed Central

Recommended from our members

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies

Author: Dresch Jacqueline M.
Elmas Abdulkadir
Wang Xiaodong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

Understanding the molecular machinery involved in transcriptional regulation is central to improving our knowledge of an organism’s development, disease, and evolution. The building blocks of this complex molecular machinery are an organism’s genomic DNA sequence and transcription factor proteins. Despite the vast amount of sequence data now available for many model organisms, predicting where transcription factors bind, often referred to as ‘motif detection’ is still incredibly challenging. In this study, we develop a novel bioinformatic approach to binding site prediction. We do this by extending pre-existing SVM approaches in an unbiased way to include all possible gapped k-mers, representing different combinations of complex nucleotide dependencies within binding sites. We show the advantages of this new approach when compared to existing SVM approaches, through a rigorous set of cross-validation experiments. We also demonstrate the effectiveness of our new approach by reporting on its improved performance on a set of 127 genomic regions known to regulate gene expression along the anterio-posterior axis in early Drosophila embryos

Columbia University Academic Commons

Directory of Open Access Journals

FigShare

Exact ground states of a staggered supersymmetric model for lattice fermions

Author: J. Jonsson
J. Vala
K. Schoutens
L. Huijse
L. Huijse
L. Huijse
N. Moran
P. Csorba
R. Bott
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2011
Field of study

We study a supersymmetric model for strongly interacting lattice fermions in the presence of a staggering parameter. The staggering is introduced as a tunable parameter in the manifestly supersymmetric Hamiltonian. We obtain analytic expressions for the ground states in the limit of small and large staggering for the model on the class of doubly decorated lattices. On this type of lattice there are two ground states, each with a different density. In one limit we find these ground states to be a simple Wigner crystal and a valence bond solid (VBS) state. In the other limit we find two types of quantum liquids. As a special case, we investigate the quantum liquid state on the one dimensional chain in detail. It is characterized by a massless kink that separates two types of order.Comment: 21 pages, 6 figures, v2: largely rewritten version with more emphasis on physical interpretatio

arXiv.org e-Print Archive

Crossref

MURAL - Maynooth University Research Archive Library

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive

HAL Descartes

Hal-Diderot

International Migration, Integration and Social Cohesion online publications

多様なポストゲノムデータのためのアラインメントフリーなアルゴリズムの構造

Author: Onodera Taku
小野寺拓
Publication venue: 情報理工学系研究科コンピュータ科学専攻
Publication date: 11/11/2015
Field of study

学位の種別: 課程博士審査委員会委員 : （主査）東京大学教授今井浩, 東京大学教授小林直樹, 東京大学教授五十嵐健夫, 東京大学教授杉山将, 東京大学講師笠原雅弘University of Tokyo(東京大学

Identification of functionally related enzymes by learning-to-rank methods

Author: Airola Antti
De Baets Bernard
Fober Thomas
Glinca Serghei
Hüllermeier Eyke
Klebe Gerhard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue
Publication date: 01/01/2014
Field of study

Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

arXiv.org e-Print Archive

Ghent University Academic Bibliography