Search CORE

10,340 research outputs found

Identification of functionally related enzymes by learning-to-rank methods

Author: Airola Antti
De Baets Bernard
Fober Thomas
Glinca Serghei
Hüllermeier Eyke
Klebe Gerhard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue
Publication date: 01/01/2014
Field of study

Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Discrete Elastic Inner Vector Spaces with Application in Time Series and Sequence Mining

Author: Bonnel Nicolas
Marteau Pierre-François
Ménier Gilbas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/06/2012
Field of study

This paper proposes a framework dedicated to the construction of what we call discrete elastic inner product allowing one to embed sets of non-uniformly sampled multivariate time series or sequences of varying lengths into inner product space structures. This framework is based on a recursive definition that covers the case of multiple embedded time elastic dimensions. We prove that such inner products exist in our general framework and show how a simple instance of this inner product class operates on some prospective applications, while generalizing the Euclidean inner product. Classification experimentations on time series and symbolic sequences datasets demonstrate the benefits that we can expect by embedding time series or sequences into elastic inner spaces rather than into classical Euclidean spaces. These experiments show good accuracy when compared to the euclidean distance or even dynamic programming algorithms while maintaining a linear algorithmic complexity at exploitation stage, although a quadratic indexing phase beforehand is required.Comment: arXiv admin note: substantial text overlap with arXiv:1101.431

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Simple identification tools in FishBase

Author: Atanacio Rachek
Bailly Nicolas
Froese Rainer
Reyes Jr. Rodolfo
Publication venue: EUT - Edizioni Università di Trieste
Publication date: 01/01/2010
Field of study

Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

OceanRep

OpenstarTs

Pathway Analysis: State of the Art

Author: Enrique Hernández-Lemus
Jesús Espinal-Enríquez
Miguel A. García-Campos
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

Frontiers - Publisher Connector

LALNVIEW: a graphical viewer for pairwise sequence alignments

Author: Duret Laurent
Gasteiger Elisabeth
Perrièe Guy
Publication venue
Publication date: 02/08/2017
Field of study

LALNVIEW is a graphical program for visualising local alignments between two sequences (protein or nucleic acids). Sequences are represented by coloured rectangles to give an overall picture of their similarities. LALNVIEW can display sequence features (exon, intron, active site, domain, propeptide, etc.) along with the alignment. When using LALNVIEW through our Web servers, sequence features are automatically extracted from database annotations (SWISS-PROT, GenBank, EMBL or HOVERGEN) and displayed with the alignment. LALNVIEW is a useful tool for analysing pairwise sequence alignments and for making the link between sequence homology and what is known about the structure or function of sequences. LALNVIEW executables for UNIX, Macintosh and PC computers are freely available from our server (http://expasy.hcuge.ch/sprot/lalnview.html

RERO DOC Digital Library

Bioinformatics: A Way Forward to Explore “Plant Omics”

Author: Iqbal Muhammad Atif
Rahman Mahmood-ur-
Rahman Mehboob-ur-
Shaheen Tayyaba
Zafar Yusuf
Publication venue: 'IntechOpen'
Publication date: 27/07/2016
Field of study

Bioinformatics, a computer-assisted science aiming at managing a huge volume of genomic data, is an emerging discipline that combines the power of computers, mathematical algorithms, and statistical concepts to solve multiple genetic/biological puzzles. This science has progressed parallel to the evolution of genome-sequencing tools, for example, the next-generation sequencing technologies, that resulted in arranging and analyzing the genome-sequencing information of large genomes. Synergism of “plant omics” and bioinformatics set a firm foundation for deducing ancestral karyotype of multiple plant families, predicting genes, etc. Second, the huge genomic data can be assembled to acquire maximum information from a voluminous “omics” data. The science of bioinformatics is handicapped due to lack of appropriate computational procedures in assembling sequencing reads of the homologs occurring in complex genomes like cotton (2n = 4x = 52), wheat (2n = 6x = 42), etc., and shortage of multidisciplinary-oriented trained manpower. In addition, the rapid expansion of sequencing data restricts the potential of acquisitioning, storing, distributing, and analyzing the genomic information. In future, inventions of high-tech computational tools and skills together with improved biological expertise would provide better insight into the genomes, and this information would be helpful in sustaining crop productivities on this planet

IntechOpen

A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem

Author: Lanckriet Gert
Sriperumbudur Bharath
Torres David
Publication venue
Publication date: 01/01/2009
Field of study

In this paper, we consider the sparse eigenvalue problem wherein the goal is to obtain a sparse solution to the generalized eigenvalue problem. We achieve this by constraining the cardinality of the solution to the generalized eigenvalue problem and obtain sparse principal component analysis (PCA), sparse canonical correlation analysis (CCA) and sparse Fisher discriminant analysis (FDA) as special cases. Unlike the

\ell_1

-norm approximation to the cardinality constraint, which previous methods have used in the context of sparse PCA, we propose a tighter approximation that is related to the negative log-likelihood of a Student's t-distribution. The problem is then framed as a d.c. (difference of convex functions) program and is solved as a sequence of convex programs by invoking the majorization-minimization method. The resulting algorithm is proved to exhibit \emph{global convergence} behavior, i.e., for any random initialization, the sequence (subsequence) of iterates generated by the algorithm converges to a stationary point of the d.c. program. The performance of the algorithm is empirically demonstrated on both sparse PCA (finding few relevant genes that explain as much variance as possible in a high-dimensional gene dataset) and sparse CCA (cross-language document retrieval and vocabulary selection for music retrieval) applications.Comment: 40 page

arXiv.org e-Print Archive

CiteSeerX