Search CORE

12,205 research outputs found

Automated generation of heuristics for biological sequence comparison

Author: Birney Ewan
Slater Guy St C
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to implement. We introduce bounded sparse dynamic programming (BSDP) to allow rapid approximation to exhaustive alignment. This is used within a framework whereby the alignment algorithms are described in terms of their underlying model, to allow automated development of efficient heuristic implementations which may be applied to a general set of sequence comparison problems. RESULTS: The speed and accuracy of this approach compares favourably with existing methods. Examples of its use in the context of genome annotation are given. CONCLUSIONS: This system allows rapid implementation of heuristics approximating to many complex alignment models, and has been incorporated into the freely available sequence alignment program, exonerate

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Author: A Löytynoja
A Löytynoja
B Sipos
BG Hall
BG Hall
BP Blackburne
C Chothia
C Dessimoz
C Kemena
C Kemena
C Notredame
CB Do
CL Strope
DA Dalquen
DA Morrison
DH Mathews
ER Mardis
G Blackshields
G Jordan
G Landan
GP Raghava
I Walle Van
J Kim
J Stoye
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JH Havgaard
JP Huelsenbeck
K Mizuguchi
LA Stebbings
M Anisimova
M Pop
MR Aniba
P Gardner
RA Cartwright
RB Russell
RC Edgar
RC Edgar
SA Berger
SF Altschul
T Golubchik
T Koestler
T Lassmann
T Lassmann
T Lassmann
W Fletcher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2012
Field of study

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

arXiv.org e-Print Archive

Crossref

UCL Discovery

Methods to study splicing from high-throughput RNA Sequencing data

Author: A Ameur
A Bhasi
A Dobin
A Mortazavi
A Oshlack
A Roberts
A Roberts
AM Mezlini
AN Brooks
B Jackson
B Kakaradov
B Langmead
B Li
B Li
BJ Haas
BJ Haas
C Trapnell
C Trapnell
C Trapnell
D Hiller
D Singh
DL Wood
DW Bryant
E Eyras
E Lee
E Turro
ET Wang
F Birzele
F Bona De
F Denoeud
F Tang
G Robertson
G Xu
GA Sacomoto
GR Grant
GS Slater
H Bao
H Jiang
H Jiang
H Kim
H Richard
J Behr
J Du
J Feng
J Hu
J Lovén
J Martin
J Salzman
J Seok
J Seok
J Wu
J Wu
JE Allen
JJ Li
JP Venables
K Schneeberger
K Wang
KD Hansen
KF Au
KL Howe
KM Borgwardt
L Chen
L Chen
L Wang
L Wang
LY Chen
M Aschoff
M Fiume
M Garber
M Griffith
M Guttman
M Stanke
M Stanke
M Sultan
MC Ryan
MF Rogers
MG Grabherr
MH Schulz
MT Dimon
N Cloonan
N Cloonan
N Deng
N Leng
N Nicolae
N Philippe
N Vijay
NA Fonseca
O Stegle
P Drewe
P Glaus
PL Martelli
PP Labaj
Q Liu
Q Liu
Q Pan
QY Zhao
R Bohnert
R Guigó
R Li
S Anders
S Djebali
S Filichkin
S Heber
S Huang
S Lee
S Mangul
S Marco-Sola
S Shen
S Sonnenburg
S Srivastava
S Tang
S Zheng
SB Montgomery
SH Nagaraj
SK Lou
T Bonfert
TA Clark
TD Wu
TD Wu
W Li
W Li
W Wang
WJ Kent
Y Hu
Y Katz
Y Li
Y Liao
Y Surget-Groba
Y Xing
Y Xing
Y Zhang
Z Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2015
Field of study

The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

arXiv.org e-Print Archive

Crossref

Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

Author: Bourne Philip E
Scheeff Eric D
Publication venue: eScholarship, University of California
Publication date: 01/09/2006
Field of study

BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used

PubMed Central

eScholarship - University of California

The computer revolution in science: steps towards the realization of computer-supported discovery environments

Author: Jong Hidde de
Rip Arie
Publication venue: Elsevier
Publication date: 01/01/1997
Field of study

The tools that scientists use in their search processes together form so-called discovery environments. The promise of artificial intelligence and other branches of computer science is to radically transform conventional discovery environments by equipping scientists with a range of powerful computer tools including large-scale, shared knowledge bases and discovery programs. We will describe the future computer-supported discovery environments that may result, and illustrate by means of a realistic scenario how scientists come to new discoveries in these environments. In order to make the step from the current generation of discovery tools to computer-supported discovery environments like the one presented in the scenario, developers should realize that such environments are large-scale sociotechnical systems. They should not just focus on isolated computer programs, but also pay attention to the question how these programs will be used and maintained by scientists in research practices. In order to help developers of discovery programs in achieving the integration of their tools in discovery environments, we will formulate a set of guidelines that developers could follow

Elsevier - Publisher Connector

University of Twente Research Information

Recovering complete and draft population genomes from metagenome datasets.

Author: Gilbert Jack A
Sangwan Naseer
Xia Fangfang
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

Woods Hole Open Access Server

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Hyper‐Heuristics and Metaheuristics for Selected Bio‐Inspired Combinatorial Optimization Problems

Author: Swiercz Aleksandra
Publication venue: 'IntechOpen'
Publication date: 30/08/2017
Field of study

Many decision and optimization problems arising in bioinformatics field are time demanding, and several algorithms are designed to solve these problems or to improve their current best solution approach. Modeling and implementing a new heuristic algorithm may be time‐consuming but has strong motivations: on the one hand, even a small improvement of the new solution may be worth the long time spent on the construction of a new method; on the other hand, there are problems for which good‐enough solutions are acceptable which could be achieved at a much lower computational cost. In the first case, specially designed heuristics or metaheuristics are needed, while the latter hyper‐heuristics can be proposed. The paper will describe both approaches in different domain problems

IntechOpen

Crossref

The Phyre2 web portal for protein modeling, prediction and analysis

Author: A González-Pérez
A Lobley
A Marchler-Bauer
A Roy
AA Canutescu
BR Jefferys
C Mao
Christopher M Yates
CM Yates
CT Porter
DT Jones
DT Jones
EV Koonin
G Fucile
IA Adzhubei
IW Davis
J Moult
J Söding
JA Capra
JJ Ward
K Arnold
LA Kelley
Lawrence A Kelley
M Higurashi
M Källberg
M Remmert
Mark N Wass
Michael J E Sternberg
MN Wass
N Siew
Ngak-Leng Sim
P Rotkiewicz
P Schmidtke
R Arjun
S Raman
SF Altschul
Stefans Mezulis
TE Lewis
X Wei
Publication venue: Springer
Publication date: 01/05/2015
Field of study

Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission

Crossref

ZENODO

PubMed Central

Kent Academic Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Spiral - Imperial College Digital Repository