Search CORE

3 research outputs found

Informative SNP Selection and Validation

Author: Mohan Babu Diana
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2007
Field of study

The search for genetic regions associated with complex diseases, such as cancer or Alzheimer\u27s disease, is an important challenge that may lead to better diagnosis and treatment. The existence of millions of DNA variations, primarily single nucleotide polymorphisms (SNPs), may allow the fine dissection of such associations. However, studies seeking disease association are limited by the cost of genotyping SNPs. Therefore, it is essential to find a small subset of informative SNPs (tag SNPs) that may be used as good representatives of the rest of the SNPs. Several informative SNP selection methods have been developed. Our experiments compare favorably to all the prediction and statistical methods by selecting the least number of informative SNPs. We proposed algorithms for faster prediction which yielded acceptable trade off. We validated our results using the k-fold test and its many variations

CiteSeerX

ScholarWorks @ Georgia State University

Machine Learning and Graph Theory Approaches for Classification and Prediction of Protein Structure

Author: Altun Gulsah
Publication venue: ScholarWorks @ Georgia State University
Publication date: 22/04/2008
Field of study

Recently, many methods have been proposed for the classification and prediction problems in bioinformatics. One of these problems is the protein structure prediction. Machine learning approaches and new algorithms have been proposed to solve this problem. Among the machine learning approaches, Support Vector Machines (SVM) have attracted a lot of attention due to their high prediction accuracy. Since protein data consists of sequence and structural information, another most widely used approach for modeling this structured data is to use graphs. In computer science, graph theory has been widely studied; however it has only been recently applied to bioinformatics. In this work, we introduced new algorithms based on statistical methods, graph theory concepts and machine learning for the protein structure prediction problem. A new statistical method based on z-scores has been introduced for seed selection in proteins. A new method based on finding common cliques in protein data for feature selection is also introduced, which reduces noise in the data. We also introduced new binary classifiers for the prediction of structural transitions in proteins. These new binary classifiers achieve much higher accuracy results than the current traditional binary classifiers

ScholarWorks @ Georgia State University

Haplotype Tagging using Support Vector Machines

Author: Er Zelikovsky
Gulsah Altun
Jingwu He
Jun Zhang
Yanqing Zhang
Publication venue
Publication date
Field of study

Abstract — Constructing a complete human haplotype map can help in associating complex diseases with SNPs (single nucleotide polymorphisms). Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. Depending on the application, tagging can achieve either budget savings by inferring non-tag SNPs from tag SNPs or shortening lengthy and difficult to handle SNP sequences obtained from Affimetrix Map Array. Tagging should first choose which SNPs to use as tags and then predict the unknown non-tag SNPs from the known tags. In this paper we propose a new SNP prediction using a robust tool for classification – Support Vector Machine (SVM). For tag selection we use a fast stepwise tag selection algorithm. An extensive experimental study on various datasets including 3 regions from HapMap shows that the tag selection based on SVM SNP prediction can reach the same prediction accuracy as the methods of Halldorson et al. [7] on the LPL using significantly fewer tags. For example, our method reaches 90 % SNP prediction accuracy using only 3 tags for Daly et al. [6] dataset with 103 SNPs. The proposed tagging method is also more accurate (but considerably slower) than multivariate linear regression method of He et al. [12]. I

CiteSeerX