Search CORE

194 research outputs found

Mean-Variance QTL Mapping Identifies Novel QTL for Circadian Activity and Exploratory Behavior in Mice.

Author: Corty Robert W
Kumar Vivek
Takahashi Joseph S
Tarantino Lisa M
Valdar William
Publication venue: The Mouseion at the JAXlibrary
Publication date: 10/12/2018
Field of study

We illustrate, through two case studies, that mean-variance QTL mapping -QTL mapping that models effects on the mean and the variance simultaneously-can discover QTL that traditional interval mapping cannot. Mean-variance QTL mapping is based on the double generalized linear model, which extends the standard linear model used in interval mapping by incorporating not only a set of genetic and covariate effects for mean but also set of such effects for the residual variance. Its potential for use in QTL mapping has been described previously, but it remains underutilized, with certain key advantages undemonstrated until now. In the first case study, a reduced complexity intercross of C57BL/6J and C57BL/6N mice examining circadian behavior, our reanalysis detected a mean-controlling QTL for circadian wheel running activity that interval mapping did not; mean-variance QTL mapping was more powerful than interval mapping at the QTL because it accounted for the fact that mice homozygous for the C57BL/6N allele had less residual variance than other mice. In the second case study, an intercross between C57BL/6J and C58/J mice examining anxiety-like behaviors, our reanalysis detected a variance-controlling QTL for rearing behavior; interval mapping did not identify this QTL because it does not target variance QTL. We believe that the results of these reanalyses, which in other respects largely replicated the original findings, support the use of mean-variance QTL mapping as standard practice

The Jackson Laboratory: The Mouseion at the JAXlibrary

Subfamily specific conservation profiles for proteins based on n-gram patterns

Author: F Fogolari
GP Raghava
H Joe
H W.
I Bahar
JC Wootton
JE Coronado
JK Vries
JK Vries
John K Vries
MO Dayhoff
MS Johnson
PC Mahalanobis
QW Dong
R Karchin
RD Finn
S Henikoff
S Henikoff
SF Altschul
SF Altschul
WS Valdar
WS Valdar
WS Valdar
Xiong Liu
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background A new algorithm has been developed for generating conservation profiles that reflect the evolutionary history of the subfamily associated with a query sequence. It is based on n-gram patterns (NP{<it>n,m</it>}) which are sets of <it>n </it>residues and <it>m </it>wildcards in windows of size <it>n+m</it>. The generation of conservation profiles is treated as a signal-to-noise problem where the signal is the count of n-gram patterns in target sequences that are similar to the query sequence and the noise is the count over all target sequences. The signal is differentiated from the noise by applying singular value decomposition to sets of target sequences rank ordered by similarity with respect to the query. Results The new algorithm was used to construct 4,248 profiles from 120 randomly selected Pfam-A families. These were compared to profiles generated from multiple alignments using the consensus approach. The two profiles were similar whenever the subfamily associated with the query sequence was well represented in the multiple alignment. It was possible to construct subfamily specific conservation profiles using the new algorithm for subfamilies with as few as five members. The speed of the new algorithm was comparable to the multiple alignment approach. Conclusion Subfamily specific conservation profiles can be generated by the new algorithm without aprioi knowledge of family relationships or domain architecture. This is useful when the subfamily contains multiple domains with different levels of representation in protein databases. It may also be applicable when the subfamily sample size is too small for the multiple alignment approach.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Background frequencies for residue variability estimates: BLOSUM revisited

Author: A del Sol Mesa
AG Murzin
C Sander
C Shannon
H Berman
I Mihalek
I Mihalek
I Mihalek
I Mihalek
I Nooren
I Reš
J Donald
J Pei
K Pruitt
O Lichtarge
O Lichtarge
P Shenkin
R Development Core Team
R Edgar
S Altschul
S Henikoff
S Jones
S Kullback
S Veerassamy
T Pupko
W Atchley
W Valdar
W Valdar
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power. Results In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure. Conclusion We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Multiparent Advanced Generation Inter-Cross to Fine-Map Quantitative Traits in Arabidopsis thaliana

Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms

Public Library of Science (PLOS)

OPUS

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Oxford University Research Archive

Exploring Protein-Protein Interactions as Drug Targets for Anti-cancer Therapy with In Silico Workflows

Author: A Goncearenco
A Goncearenco
A Marchler-Bauer
A Truszkowski
AA Bogan
B Graves
B Ma
BA Shoemaker
BA Shoemaker
BA Shoemaker
BJ Smith
CA Goble
CM Yates
D Petrey
E Cukuroglu
FP Davis
H Perez-Sanchez
HS Haase
J Bhagat
J Cinatl
JA Wells
K Wolstencroft
M Guharoy
M Li
M Li
M Li
M Petukh
M Tyagi
MK Gilson
MP Mazanetz
N Estrada-Ortiz
P Aloy
P Aloy
P Filippakopoulos
R Mosca
RR Thangudu
S Beisken
S Kim
S Shangary
S Teng
T Rolland
W Yang
WS Valdar
Y Wang
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We describe a computational protocol to aid the design of small molecule and peptide drugs that target protein-protein interactions, particularly for anti-cancer therapy. To achieve this goal, we explore multiple strategies, including finding binding hot spots, incorporating chemical similarity and bioactivity data, and sampling similar binding sites from homologous protein complexes. We demonstrate how to combine existing interdisciplinary resources with examples of semi-automated workflows. Finally, we discuss several major problems, including the occurrence of drug-resistant mutations, drug promiscuity, and the design of dual-effect inhibitors.Fil: Goncearenco, Alexander. National Institutes of Health; Estados UnidosFil: Li, Minghui. Soochow University; China. National Institutes of Health; Estados UnidosFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Shoemaker, Benjamin A. National Institutes of Health; Estados UnidosFil: Panchenko, Anna R. National Institutes of Health; Estados Unido

Crossref

CONICET Digital

Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits

Author: AR Gilmour
AR Gilmour
BJ Hayes
CR Henderson
CR Henderson
D Sorensen
DJ Spiegelhalter
DS Falconer
ES Lander
G Casella
HD Patterson
IS Duff
J-L Jannink
Julius HJ van der Werf
K Lange
L Andersson
M Lynch
ME Goddard
MI McCarthy
Michael E Goddard
MJ Sillanpää
MJ Sillanpää
N Risch
N Yi
NR Wray
P Green
PA Oliehoek
Peter M Visscher
PM Visscher
PM Visscher
PM Visscher
RB O'Hara
S Purcell
Sang Hong Lee
SH Lee
SH Lee
T Gillian
T Meuwissen
TA Sellers
TH Meuwissen
THE Meuwissen
W Valdar
W Valdar
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: In the analysis of complex traits, genetic effects can be confounded with non-genetic effects, especially when using full-sib families. Dominance and epistatic effects are typically confounded with additive genetic and non-genetic effects. This confounding may cause the estimated genetic variance components to be inaccurate and biased

Crossref

Springer - Publisher Connector

PubMed Central

University of Melbourne Institutional Repository

University of Queensland eSpace

Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data

Author: A Legarra
A Scuteri
Ben J. Hayes
Bret A. Payseur
BW Zanke
C Libioulle
C-H Kao
CD Bennett
CJ Hoggart
CJ Willer
D Sorensen
DS Falconer
G Casella
H Wang
J Winkelmann
J-L Jannink
JB Harley
JD Rioux
Julius H. J. van der Werf
LJ Scott
M Lynch
M Yeager
MF Moffatt
Michael E. Goddard
MJ Sillanpää
MJ Sillanpää
MN Weedon
N Yi
N Yi
NR Wray
P Green
Peter M. Visscher
PM Visscher
R Saxena
R Sladek
RC Jansen
S Sanna
S Xu
Sang Hong Lee
SH Lee
SH Lee
SH Lee
TH Meuwissen
THE Meuwissen
W Valdar
W Valdar
WG Hill
ZB Zeng
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Genome-wide association studies (GWAS) for quantitative traits and disease in humans and other species have shown that there are many loci that contribute to the observed resemblance between relatives. GWAS to date have mostly focussed on discovery of genes or regulatory regions habouring causative polymorphisms, using single SNP analyses and setting stringent type-I error rates. Genome-wide marker data can also be used to predict genetic values and therefore predict phenotypes. Here, we propose a Bayesian method that utilises all marker data simultaneously to predict phenotypes. We apply the method to three traits: coat colour, %CD8 cells, and mean cell haemoglobin, measured in a heterogeneous stock mouse population. We find that a model that contains both additive and dominance effects, estimated from genome-wide marker data, is successful in predicting unobserved phenotypes and is significantly better than a prediction based upon the phenotypes of close relatives. Correlations between predicted and actual phenotypes were in the range of 0.4 to 0.9 when half of the number of families was used to estimate effects and the other half for prediction. Posterior probabilities of SNPs being associated with coat colour were high for regions that are known to contain loci for this trait. The prediction of phenotypes using large samples, high-density SNP data, and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial selection programs

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

University of Queensland eSpace

A Copine family member, Cpne8, is a candidate quantitative trait gene for prion disease incubation time in mouse

Author: A Darvasi
AG Dickinson
B Hitzemann
B Yalcin
CJ Talbot
CR Moreno
CT Allen
D Westaway
DA Harris
DA Stephenson
DT Kingsbury
Emma G. Maytham
GA Carlson
GA Carlson
GA Carlson
Holger Hummerich
J Flint
JL Tomsig
John Collinge
Julia Grizenkova
K Manolakou
O Abiola
R Mott
RB Petersen
RC Moore
RL Chandler
S Capellari
S Lloyd
S Lloyd
S Lloyd
Sarah E. Lloyd
SE Lloyd
TC Sudhof
W Valdar
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

Prion disease incubation time in mice is determined by many factors including genetic background. The prion gene itself plays a major role in incubation time; however, other genes are also known to be important. Whilst quantitative trait loci (QTL) studies have identified multiple loci across the genome, these regions are often large, and with the exception of Hectd2 on Mmu19, no quantitative trait genes or nucleotides for prion disease incubation time have been demonstrated. In this study, we use the Northport heterogeneous stock of mice to reduce the size of a previously identified QTL on Mmu15 from approximately 25 to 1.2 cM. We further characterised the genes in this region and identify Cpne8, a member of the copine family, as the most promising candidate gene. We also show that Cpne8 mRNA is upregulated at the terminal stage of disease, supporting a role in prion disease. Applying these techniques to other loci will facilitate the identification of key pathways in prion disease pathogenesis

Crossref

Springer - Publisher Connector

PubMed Central

Commercially Available Outbred Mice for Genome-Wide Association Studies

Genome-wide association studies using commercially available outbred mice can detect genes involved in phenotypes of biomedical interest. Useful populations need high-frequency alleles to ensure high power to detect quantitative trait loci (QTLs), low linkage disequilibrium between markers to obtain accurate mapping resolution, and an absence of population structure to prevent false positive associations. We surveyed 66 colonies for inbreeding, genetic diversity, and linkage disequilibrium, and we demonstrate that some have haplotype blocks of less than 100 Kb, enabling gene-level mapping resolution. The same alleles contribute to variation in different colonies, so that when mapping progress stalls in one, another can be used in its stead. Colonies are genetically diverse: 45% of the total genetic variation is attributable to differences between colonies. However, quantitative differences in allele frequencies, rather than the existence of private alleles, are responsible for these population differences. The colonies derive from a limited pool of ancestral haplotypes resembling those found in inbred strains: over 95% of sequence variants segregating in outbred populations are found in inbred strains. Consequently it is possible to impute the sequence of any mouse from a dense SNP map combined with inbred strain sequence data, which opens up the possibility of cataloguing and testing all variants for association, a situation that has so far eluded studies in completely outbred populations. We demonstrate the colonies' potential by identifying a deletion in the promoter of H2-Ea as the molecular change that strongly contributes to setting the ratio of CD4+ and CD8+ lymphocytes

Public Library of Science (PLOS)

CiteSeerX

Crossref

Harvard University - DASH

Directory of Open Access Journals

HAL-Inserm

PubMed Central

UCL Discovery

Oxford University Research Archive

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties

Author: A Andreeva
A Gutteridge
AH Elcock
AR Panchenko
B Lee
B Rost
BW Mathews
CA Innis
Cathy H Wu
CH Wu
DK Smith
GJ Bartlett
H Yao
HM Berman
IH Witten
JC Platt
JD Thompson
JS Milton
K Kinoshita
K Sjolander
M Ota
MA Hearst
MJ Ondrechen
Natalia V Petrova
O Lichtarge
P Aloy
PP Wangikar
R Kohavi
R Koradi
R Landgraf
RL Tatusov
S Chakravarty
S Jones
S Parthasarathy
S Zhu
SF Altschul
SJ Campbell
SJ Hubbard
TA Binkowski
W Kabsch
W Tian
WSJ Valdar
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. RESULTS: To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. CONCLUSION: The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central