Search CORE

144 research outputs found

Word correlation matrices for protein sequence analysis and remote homology detection

Author: A Ben-Hur
A Krogh
AG Murzin
C Leslie
C Leslie
CS Leslie
G Cohen
H Rangwala
H Saigo
J Park
L Liao
O Chapelle
Peter Meinicke
QW Dong
R Finn
R Kuang
SF Altschul
T Jaakkola
T Lingner
TF Smith
Thomas Lingner
UniProtConsortium
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive. Results In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection. Conclusion Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

How predation and landscape fragmentation affect vole population dynamics

Author: A Myllymaki
B Hörnfeldt
B Martinsson
C Elton
C Elton
Chris J. Topping
CJ Krebs
CJ Topping
CJ Topping
CJ Topping
CJ Topping
DGL Innes
DK Hendrichsen
DL DeAngelis
DR Cope
E Korpimäki
E Korpimäki
E Korpimäki
F Courchamp
GEP Box
H Ylonen
HR Pulliam
I Hanski
I Hanski
I Hanski
I Hanski
I Hanski
J Agrell
J Gurevitch
J Nabe-Nielsen
J Nelson
J Sundell
J Viitala
JD Murray
JM Chase
K Norrdahl
L Fahrig
L Hansson
L Hansson
L Jiang
M Begon
M Lima
MA Brockhurst
MB Bonsall
MG Turner
MJ Smith
MK Oli
NC Stenseth
NC Stenseth
NC Stenseth
NC Stenseth
O Gilg
O Huitu
O Huitu
ON Bjørnstad
P Kindlmann
P Turchin
PA Stephens
PA Stephens
PH Leslie
Richard M. Sibly
RM May
RM Sibly
RS Ostfeld
RS Ostfeld
S Erlinge
SP Ellner
T Dalkvist
T Klemola
T Royama
T Saitoh
TF Hansen
Trine Dalkvist
TS Jensen
V Grimm
V Grimm
V Grimm
Wayne M. Getz
WW Murdoch
X Lambin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Background: Microtine species in Fennoscandia display a distinct north-south gradient from regular cycles to stable populations. The gradient has often been attributed to changes in the interactions between microtines and their predators. Although the spatial structure of the environment is known to influence predator-prey dynamics of a wide range of species, it has scarcely been considered in relation to the Fennoscandian gradient. Furthermore, the length of microtine breeding season also displays a north-south gradient. However, little consideration has been given to its role in shaping or generating population cycles. Because these factors covary along the gradient it is difficult to distinguish their effects experimentally in the field. The distinction is here attempted using realistic agent-based modelling. Methodology/Principal Findings: By using a spatially explicit computer simulation model based on behavioural and ecological data from the field vole (Microtus agrestis), we generated a number of repeated time series of vole densities whose mean population size and amplitude were measured. Subsequently, these time series were subjected to statistical autoregressive modelling, to investigate the effects on vole population dynamics of making predators more specialised, of altering the breeding season, and increasing the level of habitat fragmentation. We found that fragmentation as well as the presence of specialist predators are necessary for the occurrence of population cycles. Habitat fragmentation and predator assembly jointly determined cycle length and amplitude. Length of vole breeding season had little impact on the oscillations. Significance: There is good agreement between our results and the experimental work from Fennoscandia, but our results allow distinction of causation that is hard to unravel in field experiments. We hope our results will help understand the reasons for cycle gradients observed in other areas. Our results clearly demonstrate the importance of landscape fragmentation for population cycling and we recommend that the degree of fragmentation be more fully considered in future analyses of vole dynamics

Central Archive at the University of Reading

Public Library of Science (PLOS)

Crossref

Roskilde Universitet

Directory of Open Access Journals

PubMed Central

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Author: A Ben-Hur
A Kumar
AG Murzin
AR Shah
B Liu
BJ Webb-Robertson
BJ Webb-Robertson
BJ Webb-Robertson
Bobbie-Jo M Webb-Robertson
C Leslie
Christopher S Oehmen
CS Leslie
H Rangwala
H Saigo
I Jung
I Melvin
I Melvin
J Weston
Kyle G Ratuiste
L Liao
NH Anderson
QW Dong
R Kuang
S Hochreiter
SF Altschul
SF Altschul
T Damoulas
T Lingner
TF Smith
WS Noble
WS Noble
Y Hou
Y Hou
Y Yang
Y Yuan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. Results We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. Conclusions A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Building multiclass classifiers for remote homology detection and fold recognition

Author: A Heger
A Krogh
A Sun
AG Murzin
B Taskar
C Leslie
C Leslie
CA Orengo
CD Huang
CH Ding
D Mittelman
E le
E Lindahl
EL Allwein
F Aiolli
F Rosenblatt
George Karypis
H Rangwala
H Saigo
Huzefa Rangwala
I Tsochantaridis
J Rousu
J Shi
J Weston
K Crammer
K Crammer
L Holm
L Liao
M Collins
M Collins
M Marti-Renom
P Baldi
R Kuang
R Rifkin
S Altschul
SB Needleman
SE Brenner
T Jaakkola
T Jaakkola
T Joachims
TF Smith
TG Dietterich
V Vapnik
W Pearson
Y Guermeur
Y Guermeur
Y Hou
Y Hou
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS: We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION: Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Minnesota Digital Conservancy

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis

Author: A Ben-Hur
A Floratos
AR Shah
B Qian
B Rost
B-J Webb-Robertson
Bin Liu
C Leslie
CG Nevill-Manning
CS Leslie
H Ogul
H Rangwala
H Saigo
I Rigoutsos
J Bellegarda
J Shawe-Taylor
K Karplus
L Holm
L Liao
Lei Lin
M Ganapathiraju
M Gribskov
Q Dong
Q Dong
Q Dong
Q Dong
Q Dong
Qiwen Dong
QJ Su
QW Dong
R Kuang
S Henikoff
SE Brenner
SE Dowd
SF Altschul
SF Altschul
T Damoulas
T Håndstad
T Jaakkola
T Lingner
TF Smith
TK Landauer
TL Bailey
VN Vapnik
WR Pearson
WS Noble
Xiaolong Wang
Xuan Wang
Y Hou
Y Hou
Y Yang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. Results In this paper, a novel building block of proteins called Top-<it>n</it>-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-<it>n</it>-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-<it>n</it>-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-<it>n</it>-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-<it>n</it>-grams and LSA gives significantly better results compared to related methods. Conclusion The method based on Top-<it>n</it>-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-<it>n</it>-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Population screening for colorectal cancer: the implications of an ageing population

Author: A Frazier
A Leslie
AL Frazier
AL Frazier
BP Towler
BP Towler
C Dukes
D A L Macafee
D K Whynes
DB Nelson
DK Whynes
DM Eddy
F Safi
J Faive
J H Scholefield
J Karnon
J Kewenter
J Oeppen
JD Hardcastle
JD Hardcastle
JH Scholefield
JH Scholefield
JJ Smith
JN Lund
JS Mandel
L Garvican
M Waller
MA Memon
MF Drummond
MJ Buxton
O Kronborg
PA Farrands
S Dube
S Moss
SJ Winawer
TF Imperiale
Publication venue: Nature Publishing Group
Publication date: 09/12/2008
Field of study

Population screening for colorectal cancer (CRC) has recently commenced in the United Kingdom supported by the evidence of a number of randomised trials and pilot studies. Certain factors are known to influence screening cost-effectiveness (e.g. compliance), but it remains unclear whether an ageing population (i.e. demographic change) might also have an effect. The aim of this study was to simulate a population-based screening setting using a Markov model and assess the effect of increasing life expectancy on CRC screening cost-effectiveness. A Markov model was constructed that aimed, using a cohort simulation, to estimate the cost-effectiveness of CRC screening in an England and Wales population for two timescales: 2003 (early cohort) and 2033 (late cohort). Four model outcomes were calculated; screened and non-screened cohorts in 2003 and 2033. The screened cohort of men and women aged 60 years were offered biennial unhydrated faecal occult blood testing until the age of 69 years. Life expectancy was assumed to increase by 2.5 years per decade. There were 407 552 fewer people entering the model in the 2033 model due to a lower birth cohort, and population screening saw 30 345 fewer CRC-related deaths over the 50 years of the model. Screening the 2033 cohort cost £96 million with cost savings of £43 million in terms of detection and treatment and £28 million in palliative care costs. After 30 years of follow-up, the cost per life year saved was £1544. An identical screening programme in an early cohort (2003) saw a cost per life year saved of £1651. Population screening for CRC is costly but enables cost savings in certain areas and a considerable reduction in mortality from CRC. This Markov simulation suggests that the cost-effectiveness of population screening for CRC in the United Kingdom may actually be improved by rising life expectancies

Crossref

PubMed Central

Institute of Cancer Research Repository

University of Queensland eSpace

Antimalarial Therapy Selection for Quinolone Resistance among Escherichia coli in the Absence of Quinolone Exposure, in Tropical South America

Author: A Hakanen
A Liesgang
Allison McGeer
Barbara M. Willey
C Hao
CA Hart
Dennis Scolnik
E Lautenbach
FC Tenover
G Ciarrocchi
G Ruiz-Irastorza
GA Jacoby
GY Lesher
H Goosens
Ian Davis
IN Okeke
J Turnidge
Jane Polsky
Keyro Rizg
KJ Arrow
KT Smith
L Anselmo
LM Weigel
LR Lard
Michael S. Silverman
MM Neuhauser
Nick Daneman
NP Brenwald
Olga Imas
P Popelka
Paul Yang
PC Applebaum
PK Lindgren
Robert Frenck
Ross J. Davidson
Roy Rowsell
S Nys
Shelly Bolotin
SK Jain
SN Cohen
T James
T Leslie
TE Willems
TF Byrd
TJ Barrett
Vanessa Porter
Publication venue: Public Library of Science
Publication date: 16/07/2008
Field of study

BACKGROUND: Bacterial resistance to antibiotics is thought to develop only in the presence of antibiotic pressure. Here we show evidence to suggest that fluoroquinolone resistance in Escherichia coli has developed in the absence of fluoroquinolone use. METHODS: Over 4 years, outreach clinic attendees in one moderately remote and five very remote villages in rural Guyana were surveyed for the presence of rectal carriage of ciprofloxacin-resistant gram-negative bacilli (GNB). Drinking water was tested for the presence of resistant GNB by culture, and the presence of antibacterial agents and chloroquine by HPLC. The development of ciprofloxacin resistance in E. coli was examined after serial exposure to chloroquine. Patient and laboratory isolates of E. coli resistant to ciprofloxacin were assessed by PCR-sequencing for quinolone-resistance-determining-region (QRDR) mutations. RESULTS: In the very remote villages, 4.8% of patients carried ciprofloxacin-resistant E. coli with QRDR mutations despite no local availability of quinolones. However, there had been extensive local use of chloroquine, with higher prevalence of resistance seen in the villages shortly after a Plasmodium vivax epidemic (p<0.01). Antibacterial agents were not found in the drinking water, but chloroquine was demonstrated to be present. Chloroquine was found to inhibit the growth of E. coli in vitro. Replica plating demonstrated that 2-step QRDR mutations could be induced in E. coli in response to chloroquine. CONCLUSIONS: In these remote communities, the heavy use of chloroquine to treat malaria likely selected for ciprofloxacin resistance in E. coli. This may be an important public health problem in malarious areas

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Complexity of the Inoculum Determines the Rate of Reversion of SIV Gag CD8 T Cell Mutant Virus and Outcome of Infection

Author: AJ Leslie
B Asquith
B Li
BF Keele
BF Pratt
CA Derdeyn
Caroline S. Fernandez
CJ Dale
CJ Dale
CS Fernandez
CS Fernandez
Damian F. J. Purcell
DP Wilson
DR Chopera
EM Long
EW Fiebig
GH Learn
H Crawford
J Grobler
J Petravic
Jane Howard
Janka Petravic
Jeanette C. Reece
JF Salazar-Gonzalez
JS Gibbs
K Ritola
L Loh
L Loh
L Loh
Liyen Loh
M Altfeld
M Sagar
ME van der Ende
Mehala Balamurali
Miles P. Davenport
MP Davenport
MZ Smith
PA Goepfert
PJ Goulder
R De Rose
Robert Center
Sheilajen Alcantara
SJ Kent
SJ Kent
SJ Kent
SM Wolinsky
Stephen J. Kent
Susan Ross
T Zhu
TC Friedrich
TF Wolfs
TM Allen
V Liska
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Escape mutant (EM) virus that evades CD8+ T cell recognition is frequently observed following infection with HIV-1 or SIV. This EM virus is often less replicatively “fit” compared to wild-type (WT) virus, as demonstrated by reversion to WT upon transmission of HIV to a naïve host and the association of EM virus with lower viral load in vivo in HIV-1 infection. The rate and timing of reversion is, however, highly variable. We quantified reversion to WT of a series of SIV and SHIV viruses containing minor amounts of WT virus in pigtail macaques using a sensitive PCR assay. Infection with mixes of EM and WT virus containing ≥10% WT virus results in immediate and rapid outgrowth of WT virus at SIV Gag CD8 T cell epitopes within 7 days of infection of pigtail macaques with SHIV or SIV. In contrast, infection with biologically passaged SHIVmn229 viruses with much smaller proportions of WT sequence, or a molecular clone of pure EM SIVmac239, demonstrated a delayed or slow pattern of reversion. WT virus was not detectable until ≥8 days after inoculation and took ≥8 weeks to become the dominant quasispecies. A delayed pattern of reversion was associated with significantly lower viral loads. The diversity of the infecting inoculum determines the timing of reversion to WT virus, which in turn predicts the outcome of infection. The delay in reversion of fitness-reducing CD8 T cell escape mutations in some scenarios suggests opportunities to reduce the pathogenicity of HIV during very early infection

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UNSWorks

University of Melbourne Institutional Repository

Nephrin Regulates Lamellipodia Formation by Assembling a Protein Complex That Includes Ship2, Filamin and Lamellipodin

Author: A Hogan
C Chang
C Jimenez
D Ravid
DL Goosney
DR Critchley
F Frischknecht
F Nakamura
F Nakamura
FJ Byfield
GM Rivera
H Takala
IM Blasutig
J Lahdenpera
J Reiser
J Saarikangas
J Zhu
James Keen
JB Marchand
JM Dyson
JM Dyson
JM Dyson
JR Bamburg
K Asanuma
K Smith
Kamal Abuarquob
KG Campellone
KK Wary
LA Flanagan
LB Holzman
LE Rameh
Leslie Cook
LM Sly
M Kestila
M Krause
M Rantanen
Madhusudan Venkatareddy
MC Weiger
N Boute
N Jones
N Prasad
N Prasad
N Tikhmyanova
NK Prasad
NK Prasad
P Defilippi
P Garg
P Garg
P Garg
Puneet Garg
PW Majerus
R Verma
R Verma
Rakesh Verma
RK Vadlamudi
S Gruenheid
S Lehtonen
SJ Geier
T Habib
T Kiema
T Onji
TD Pollard
TF Martin
Y Hamano
Y Harita
Y Harita
Y Xu
Publication venue: Public Library of Science
Publication date: 14/12/2011
Field of study

Actin dynamics has emerged at the forefront of podocyte biology. Slit diaphragm junctional adhesion protein Nephrin is necessary for development of the podocyte morphology and transduces phosphorylation-dependent signals that regulate cytoskeletal dynamics. The present study extends our understanding of Nephrin function by showing in cultured podocytes that Nephrin activation induced actin dynamics is necessary for lamellipodia formation. Upon activation Nephrin recruits and regulates a protein complex that includes Ship2 (SH2 domain containing 5′ inositol phosphatase), Filamin and Lamellipodin, proteins important in regulation of actin and focal adhesion dynamics, as well as lamellipodia formation. Using the previously described CD16-Nephrin clustering system, Nephrin ligation or activation resulted in phosphorylation of the actin crosslinking protein Filamin in a p21 activated kinase dependent manner. Nephrin activation in cell culture results in formation of lamellipodia, a process that requires specialized actin dynamics at the leading edge of the cell along with focal adhesion turnover. In the CD16-Nephrin clustering model, Nephrin ligation resulted in abnormal morphology of actin tails in human podocytes when Ship2, Filamin or Lamellipodin were individually knocked down. We also observed decreased lamellipodia formation and cell migration in these knock down cells. These data provide evidence that Nephrin not only initiates actin polymerization but also assembles a protein complex that is necessary to regulate the architecture of the generated actin filament network and focal adhesion dynamics

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Crystal Structure of Legionella DotD: Insights into the Relationship between Type IVB and Type II/III Secretion Systems

The Dot/Icm type IVB secretion system (T4BSS) is a pivotal determinant of Legionella pneumophila pathogenesis. L. pneumophila translocate more than 100 effector proteins into host cytoplasm using Dot/Icm T4BSS, modulating host cellular functions to establish a replicative niche within host cells. The T4BSS core complex spanning the inner and outer membranes is thought to be made up of at least five proteins: DotC, DotD, DotF, DotG and DotH. DotH is the outer membrane protein; its targeting depends on lipoproteins DotC and DotD. However, the core complex structure and assembly mechanism are still unknown. Here, we report the crystal structure of DotD at 2.0 Å resolution. The structure of DotD is distinct from that of VirB7, the outer membrane lipoprotein of the type IVA secretion system. In contrast, the C-terminal domain of DotD is remarkably similar to the N-terminal subdomain of secretins, the integral outer membrane proteins that form substrate conduits for the type II and the type III secretion systems (T2SS and T3SS). A short β-segment in the otherwise disordered N-terminal region, located on the hydrophobic cleft of the C-terminal domain, is essential for outer membrane targeting of DotH and Dot/Icm T4BSS core complex formation. These findings uncover an intriguing link between T4BSS and T2SS/T3SS

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central