Search CORE

arXiv.org e-Print Archive

UCL Discovery

Killing Tensors and Conformal Killing Tensors from Conformal Killing Vectors

Author: Alan Barnes
Beem J K
Bell P
Benenti S
Benenti S
Carter B
Collinson C D
Daftardar V
Edgar S B
Geroch R
Hauser I
Joly G C
Kalnins E
Katzin G H
Kerr R P
Kimura M
Koutras A
Ludwig G
O'Connor J E R
Raffaele Rani
Rosquist K
S Brian Edgar
Sommers P
Stephani H
Thomas T Y
Thompson G
Walker M
Weir G J
Wolf T
Woodhouse N H J
Publication venue: 'IOP Publishing'
Publication date: 16/01/2003
Field of study

Koutras has proposed some methods to construct reducible proper conformal Killing tensors and Killing tensors (which are, in general, irreducible) when a pair of orthogonal conformal Killing vectors exist in a given space. We give the completely general result demonstrating that this severe restriction of orthogonality is unnecessary. In addition we correct and extend some results concerning Killing tensors constructed from a single conformal Killing vector. A number of examples demonstrate how it is possible to construct a much larger class of reducible proper conformal Killing tensors and Killing tensors than permitted by the Koutras algorithms. In particular, by showing that all conformal Killing tensors are reducible in conformally flat spaces, we have a method of constructing all conformal Killing tensors (including all the Killing tensors which will in general be irreducible) of conformally flat spaces using their conformal Killing vectors.Comment: 18 pages References added. Comments and reference to 2-dim case. Typos correcte

CERN Document Server

Optimizing substitution matrix choice and gap parameters for sequence alignment

Author: CB Do
CB Do
CN Dewey
D Gusfield
DT Jones
E Kim
G Blackshields
GA Price
GH Gonnet
I Van Walle
J Flannick
J Kececioglu
J Pei
JD Thompson
JD Thompson
JG Henikoff
K Katoh
M Box
MA Larkin
MO Dayhoff
MP Styczynski
MS Waterman
O Chapelle
RC Edgar
RC Edgar
Robert C Edgar
S Henikoff
T Lassmann
T Muller
T Muller
TM Phuong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments. Results POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB. Conclusion The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p

State of the art: refinement of multiple sequence alignments

Author: A Marchler-Bauer
AB Robinson
AJ Jennings
Anna R Panchenko
C Notredame
C Notredame
CB Do
Christopher J Lanczycki
GJ Barton
IM Wallace
J Chen
J Heringa
J Heringa
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JF Gibrat
K Katoh
K Katoh
O Gotoh
Paul A Thiessen
RC Edgar
S Chakrabarti
Saikat Chakrabarti
SR Eddy
Stephen H Bryant
T Lassmann
T Madej
Teresa M Przytycka
WR Taylor
WS Valdar
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Accurate multiple sequence alignments of proteins are very important in computational biology today. Despite the numerous efforts made in this field, all alignment strategies have certain shortcomings resulting in alignments that are not always correct. Refinement of existing alignment can prove to be an intelligent choice considering the increasing importance of high quality alignments in large scale high-throughput analysis. RESULTS: We provide an extensive comparison of the performance of the alignment refinement algorithms. The accuracy and efficiency of the refinement programs are compared using the 3D structure-based alignments in the BAliBASE benchmark database as well as manually curated high quality alignments from Conserved Domain Database (CDD). CONCLUSION: Comparison of performance for refined alignments revealed that despite the absence of dramatic improvements, our refinement method, REFINER, which uses conserved regions as constraints performs better in improving the alignments generated by different alignment algorithms. In most cases REFINER produces a higher-scoring, modestly improved alignment that does not deteriorate the well-conserved regions of the original alignment

Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

Author: A Bahr
AR Subramanian
C Grasso
C Notredame
C Notredame
CB Do
Hayato Yamana
J Kececioglu
JD Thompson
JD Thompson
JD Thompson
JD Thompson
K Karplus
K Katoh
K Katoh
MA McClure
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O O'Sullivan
Osamu Gotoh
RC Edgar
Shinsuke Yamada
T Jiang
W Miller
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels. RESULTS: We propose a novel group-to-group sequence alignment algorithm that uses a piecewise linear gap cost. We developed a program called PRIME, which employs our proposed algorithm to optimize the well-defined sum-of-pairs score. PRIME stands for Profile-based Randomized Iteration MEthod. We evaluated PRIME and some recent MSA programs using BAliBASE version 3.0 and PREFAB version 4.0 benchmarks. The results of benchmark tests showed that PRIME can construct accurate alignments comparable to the most accurate programs currently available, including L-INS-i of MAFFT, ProbCons, and T-Coffee. CONCLUSION: PRIME enables users to construct accurate alignments without having to employ pairwise alignment information. PRIME is available at

Species-level functional profiling of metagenomes and metatranscriptomes.

Author: A Sczyrba
A Shafquat
AE Duran-Pinedo
AK Sharma
B Buchfink
B Langmead
BE Suzek
BK Swan
C Burke
C Luo
Curtis Huttenhower
D Medini
DH Huson
DT Truong
DT Truong
E Pasolli
EA Franzosa
EA Franzosa
Eric A. Franzosa
George Weingart
GG Silva
Gholamali Rahnavard
H Hauswedell
J Kim
J Lloyd-Price
J Lloyd-Price
J Ravel
J. Gregory Caporaso
JA Fuhrman
K Huang
Karen Schwarzberg Lipson
Lauren J. McIver
LR Thompson
LR Thompson
Luke R. Thompson
M Hamady
M Kanehisa
M Scholz
Melanie Schirmer
MY Galperin
N Segata
N Segata
Nicola Segata
OU Mason
P Petrenko
PJ Turnbaugh
R Caspi
RC Edgar
RD Finn
Rob Knight
S Abubucker
S Nayfach
S Sunagawa
S Sunagawa
T Bose
UniProt Consortium.
W Huang
Y Ye
Y Zhao
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

eScholarship - University of California

Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

Author: AR Panchenko
B Morgenstern
B Rost
C Chothia
C Kemena
C Notredame
CB Do
Cédric Notredame
D Baker
DG Higgins
DT Jones
Eugene A. Permyakov
F Armougom
G Yona
GH Gonnet
H-N Lin
Hsin-Nan Lin
HY Zhou
HY Zhou
J Skolnick
J Soding
JD Thompson
Jia-Ming Chang
JM Pei
JM Pei
JM Pei
K Katoh
L Rychlewski
L Wang
LA Kelley
MJ Sternberg
MO Dayhoff
O O'Sullivan
P Hogeweg
R Hagopian
R Sadreyev
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
S Henikoff
SF Altschul
SF Altschul
T Hara
T Müller
Ting-Yi Sung
U Roshan
VA Simossis
W Kabsch
W Kabsch
Wen-Lian Hsu
Y Zhang
Y Zhang
Y Zhang
Publication venue: Public Library of Science
Publication date: 02/12/2011
Field of study

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently

Public Library of Science (PLOS)

Accelerated large-scale multiple sequence alignment

Author: A Szalkowski
A Wilm
A Wirawan
AV Bhatt
C Grasso
C Notredame
D Mikhailov
DF Feng
E Eskin
G Tan
GM Amdahl
H Carroll
H Vandierendonck
I Letunic
J Cheetham
J Ebedes
J Nickolls
JD Thompson
JD Thompson
JD Thompson
K Katoh
KB Li
M Farrar
M Feldman
M Friedman
OpenMP
Quinn O Snell
RC Edgar
S Lloyd
S Washietl
Scott Lloyd
SR Eddy
T Lassmann
T Oliver
T Ramdas
T Wang
X Deng
X Lin
Y Li
Y Liu
Y Liu
Publication venue: BioMed Central
Publication date: 01/12/2011
Field of study

Abstract Background Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware. Results We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor. Conclusions Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from <url>http://dna.cs.byu.edu/msa/</url>.</p

CLUSS: Clustering of protein sequences based on a new similarity measure

Author: A Krause
Abdellali Kelil
AJ Enright
Alain Fleury
C Notredame
D Higgins
ELL Sonnhammer
F Titgemeyer
G Reinert
G Yona
H Lodish
IV Tetko
J Felsenstein
J Heringa
J Rocha
JD Thompson
JD Thompson
JH Ward
JH Ward
JS Varré
K Katoh
K Sjölander
K Sjölander
M Ike
M Kimura
MO Dayhoff
MY Leung
N Côté
N Wicker
P Pipenbacher
R Jothi
RC Edgar
RC Edgar
RO Duda
Ryszard Brzezinski
S Fanning
S Henikoff
S Karlin
S Karlin
S Karlin
S Vinga
SF Altschul
SF Altschul
Shengrui Wang
T Fukamizo
T Ishimizu
V Batagelj
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "<it>phylogenetic</it>" in the sense of "<it>relatedness of biological functions</it>". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity. Conclusion We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.</p

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts

Author: A Krogh
AN Tegge
C Notredame
CB Do
DF Feng
DG Higgins
DG Higgins
DG Higgins
DG Higgins
F Jeanmougin
F Wilcoxon
G Pollastri
GH Gonnet
GJ Barton
GP Raghava
GP Raghava
HY Zhou
J Cheng
J Heringa
J Pei
J Pei
J Pei
J Söding
J Söding
JD Thompson
JD Thompson
JD Thompson
JD Thompson
Jianlin Cheng
K Katoh
M Brudno
M Larkin
NK Kim
NS Boutonnet
O Poirot
O Poirot
PHA Sneath
R Chenna
R Durbin
RC Edgar
RC Edgar
RK Bradley
RS Amarendran
RS Amarendran
RS Amarendran
S Chikkagoudar
SE Brenner
SH Sze
T Kawabata
TL Bailey
U Roshan
V Walle
V Walle
Xin Deng
YC Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Multiple Sequence Alignment (MSA) is a basic tool for bioinformatics research and analysis. It has been used essentially in almost all bioinformatics tasks such as protein structure modeling, gene and protein function prediction, DNA motif recognition, and phylogenetic analysis. Therefore, improving the accuracy of multiple sequence alignment is important for advancing many bioinformatics fields. Results We designed and developed a new method, MSACompro, to synergistically incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. The method is different from the multiple sequence alignment methods (e.g. 3D-Coffee) that use the tertiary structure information of some sequences since the structural information of our method is fully predicted from sequences. To the best of our knowledge, applying predicted relative solvent accessibility and contact map to multiple sequence alignment is novel. The rigorous benchmarking of our method to the standard benchmarks (i.e. BAliBASE, SABmark and OXBENCH) clearly demonstrated that incorporating predicted protein structural information improves the multiple sequence alignment accuracy over the leading multiple protein sequence alignment tools without using this information, such as MSAProbs, ProbCons, Probalign, T-coffee, MAFFT and MUSCLE. And the performance of the method is comparable to the state-of-the-art method PROMALS of using structural features and additional homologous sequences by slightly lower scores. Conclusion MSACompro is an efficient and reliable multiple protein sequence alignment tool that can effectively incorporate predicted protein structural information into multiple sequence alignment. The software is available at <url>http://sysbio.rnet.missouri.edu/multicom_toolbox/</url>.</p