Search CORE

6,227 research outputs found

Prediction and classification for GPCR sequences based on ligand specific features

Author: F. Horn
G.E. Tusnády
K.R. Sreekumar
M. Bouvier
R. Karchin
S. Altshul
T. Gudermann
W. Pearson
Y. Huang
Publication venue: Lecture Notes in Computer Science,
Publication date: 01/01/2006
Field of study

Functional identification of G-Protein Coupled Receptors (GPCRs) is one of the current focus areas of pharmaceutical research. Although thousands of GPCR sequences are known, many of them are orphan sequences (the activating ligand is unknown). Therefore, classification methods for automated characterization of orphan GPCRs are imperative. In this study, for predicting Level 1 subfamilies of GPCRs, a novel method for obtaining class specific features, based on the existence of activating ligand specific patterns, has been developed and utilized for a majority voting classification. Exploiting the fact that there is a non-promiscuous relationship between the specific binding of GPCRs into their ligands and their functional classification, our method classifies Level 1 subfamilies of GPCRs with a high predictive accuracy between 99% and 87% in a three-fold cross validation test. The method also tells us which motifs are significant for class determination which has important design implications. The presented machine learning approach, bridges the gulf between the excess amount of GPCR sequence data and their poor functional characterization

Crossref

Sabanci University Research Database

Integrating diverse genomic data using gene sets

Author: Karchin Rachel
Marchionni Luigi
Parmigiani Giovanni
Tyekucheva Svitlana
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We introduce and evaluate data analysis methods to interpret simultaneous measurement of multiple genomic features made on the same biological samples. Our tools use gene sets to provide an interpretable common scale for diverse genomic information. We show we can detect genetic effects, although they may act through different mechanisms in different samples, and show we can discover and validate important disease-related gene sets that would not be discovered by analyzing each data type individually

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

MODBASE, a database of annotated comparative protein structure models and associated resources.

Author: Barkan David T
Carter Hannah
Davis Fred P
Eramian David
Eswar Narayanan
Karchin Rachel
Kelly Libusha
Mankoo Parminder
Marti-Renom Marc A
Pieper Ursula
Sali Andrej
Webb Ben M
Publication venue: eScholarship, University of California
Publication date: 23/10/2008
Field of study

MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

PubMed Central

eScholarship - University of California

SAM-T08, HMM-based protein structure prediction

Author: Altschul
Archie
Bernstein
Bystroff
de Brevern
Hodis
Hughey
K. Karplus
Kabsch
Karchin
Karchin
Murzin
Paluszewski
Rohl
Sayle
Van Der Spoel
Wang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.htm

CiteSeerX

Crossref

PubMed Central

Classifying Variants of Undetermined Significance in BRCA2 with Protein Likelihood Ratios

Author: Agarwal Mukesh
Beattie Mary S.
Couch Fergus
Karchin Rachel
Sali Andrej
Publication venue: Libertas Academica
Publication date: 01/01/2008
Field of study

Background: Missense (amino-acid changing) variants found in cancer predisposition genes often create difficulties when clinically interpreting genetic testing results. Although bioinformatics has developed approaches to predicting the impact of these variants, many of these approaches have not been readily applicable in the clinical setting. Bioinformatics approaches for predicting the impact of these variants have not yet found their footing in clinical practice because 1) interpreting the medical relevance of predictive scores is difficult; 2) the relationship between bioinformatics “predictors” (sequence conservation, protein structure) and cancer susceptibility is not understood.Methodology/Principal Findings: We present a computational method that produces a probabilistic likelihood ratio predictive of whether a missense variant impairs protein function. We apply the method to a tumor suppressor gene, BRCA2, whose loss of function is important to cancer susceptibility. Protein likelihood ratios are computed for 229 unclassified variants found in individuals from high-risk breast/ovarian cancer families. We map the variants onto a protein structure model, and suggest that a cluster of predicted deleterious variants in the BRCA2 OB1 domain may destabilize BRCA2 and a protein binding partner, the small acidic protein DSS1. We compare our predictions with variant “re-classifications” provided by Myriad Genetics, a biotechnology company that holds the patent on BRCA2 genetic testing in the U.S., and with classifications made by an established medical genetics model [1]. Our approach uses bioinformatics data that is independent of these genetics-based classifications and yet shows significant agreement with them. Preliminary results indicate that our method is less likely to make false positive errors than other bioinformatics methods, which were designed to predict the impact of missense mutations in general.Conclusions/Significance: Missense mutations are the most common disease-producing genetic variants. We present a fast, scalable bioinformatics method that integrates information about protein sequence, conservation, and structure in a likelihood ratio that can be integrated with medical genetics likelihood ratios. The protein likelihood ratio, together with medical genetics likelihood ratios, can be used by clinicians and counselors to communicate the relevance of a VUS to the individual who has that VUS. The approach described here is generalizable to regions of any tumor suppressor gene that have been structurally determined by X-ray crystallography or for which a protein homology model can be built

Directory of Open Access Journals

PubMed Central

CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer

Author: Amit
Benjamini
Birney
Breiman
Carter
Carter
Dewey Kim
Forbes
Futreal
Hannah Carter
Kaminker
Karchin
Mark Diekhans
Michael C. Ryan
Mooney
Ng
Pruitt
Pruitt
Rachel Karchin
Subramanian
Sunyaev
Wing Chung Wong
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline

CiteSeerX

Crossref

PubMed Central

Pseudorapidity Distribution of Charged Particles in PbarP Collisions at root(s)= 630GeV

Author: Abe
Alner
Arnison
B Wilkens
Bengtsson
Bernard
C Biino
C Liapis
D Lynn
Ellett
Harr
Harr
J Zweizig
L Pesando
Liapis
M Medinnis
M Punturo
Marchesini
P Karchin
P Kreuzer
P Schlein
Paige
R Harr
S Erhan
S Palestini
Sjöstrand
Sjöstrand
W Hofmann
Wilkens
Publication venue: 'Elsevier BV'
Publication date: 01/01/1997
Field of study

Using a silicon vertex detector, we measure the charged particle pseudorapidity distribution over the range 1.5 to 5.5 using data collected from PbarP collisions at root s = 630 GeV. With a data sample of 3 million events, we deduce a result with an overall normalization uncertainty of 5%, and typical bin to bin errors of a few percent. We compare our result to the measurement of UA5, and the distribution generated by the Lund Monte Carlo with default settings. This is only the second measurement at this level of precision, and only the second measurement for pseudorapidity greater than 3.Comment: 9 pages, 5 figures, LaTeX format. For ps file see http://hep1.physics.wayne.edu/harr/harr.html Submitted to Physics Letters

arXiv.org e-Print Archive

Crossref

CERN Document Server

On the hierarchical classification of G Protein-Coupled Receptors

Author: A. A. Freitas
A. Secker
Attwood
Bhasin
Bhasin
Bissantz
Cardoso
Christopoulos
D. R. Flower
Das
Davies
Flower
Flower
Foord
Gether
Gloriam
Guo
Horn
H bert
J. Timmis
Karchin
Keerthi
Klabunde
Kolakowski
Lapinsh
M. Mendao
M. N. Davies
Milligan
Papasaikas
Prabhu
Sandberg
Schi th
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/10/2007
Field of study

Motivation: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. Results: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases

CiteSeerX

Crossref

Aberystwyth Research Portal

Kent Academic Repository

Accumulation of driver and passenger mutations during tumor progression

Author: B. Vogelstein
Barrick
Beerenwinkel
D. Kim
Dewanji
Durrett
Giardiello
Giardiello
Greenman
H. Carter
H. Ohtsuki
Haber
Huang
I. Bozic
K. W. Kinzler
Klein
Knudson
Komarova
Kraus-Ruppert
Lengauer
Ley
Louis
M. A. Nowak
Maley
Mimeault
Moolgavkar
Moolgavkar
R. Karchin
S. Chen
Simpson
T. Antal
Teschendorff
Vogelstein
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 08/12/2009
Field of study

Major efforts to sequence cancer genomes are now occurring throughout the world. Though the emerging data from these studies are illuminating, their reconciliation with epidemiologic and clinical observations poses a major challenge. In the current study, we provide a novel mathematical model that begins to address this challenge. We model tumors as a discrete time branching process that starts with a single driver mutation and proceeds as each new driver mutation leads to a slightly increased rate of clonal expansion. Using the model, we observe tremendous variation in the rate of tumor development - providing an understanding of the heterogeneity in tumor sizes and development times that have been observed by epidemiologists and clinicians. Furthermore, the model provides a simple formula for the number of driver mutations as a function of the total number of mutations in the tumor. Finally, when applied to recent experimental data, the model allows us to calculate, for the first time, the actual selective advantage provided by typical somatic mutations in human tumors in situ. This selective advantage is surprisingly small, 0.005 +- 0.0005, and has major implications for experimental cancer research

arXiv.org e-Print Archive

CiteSeerX

Crossref

Harvard University - DASH

PubMed Central