7 research outputs found

    Improved Algorithms for the Selection of Tag SNPs

    No full text
    近期研究顯示,觀察人類族群的連鎖不平衡(Linkage Disequilibrium ; LD)形式可發現其形成類似區塊的結構。整條染色體可被切割成高連鎖不平衡區和低連鎖不平衡區互為間隔。 其中高連鎖不平衡區被稱為單體型區塊(haplotype block)。在單體型區塊中,單體型的樣式數量有限。因此,只需要少量的單核苷酸多型性(Single Nucleotide Polymorphism ;SNP)就足以辨別出各種單體型樣式。而這些少量的SNPs稱為標籤SNPs 為了尋找最少量的標籤SNPs,我們提出一個結合分支設限演算法(branch-and-bound)和貪婪演算法 (greedy algorithm) 的方法。該方法探索更大的解空間,以得到比傳統的貪婪演算法更好的解。它還允許使用者在效率和最佳解之間做取捨。這個演算法經由我們實做已經在各種模擬和生物的數據做測試。實驗結果指出, 比起之前的方法, 我們的方法能找到更少量的標籤SNPs。這個方法還可以相當普遍地被應用在其他貪婪演算法可解的問題上面。 另外,藉由結合一條染色體上任兩兩SNPs的關連性的資料,我們可以減少標籤 SNPs的數量。某些標籤SNPs 和其他標籤SNPs擁有完全的關連性。如此,可從完全關連性推斷其值的標籤SNPs可以從原先找到的標籤SNPs中刪除。依這個觀念, 我們提出了兩個方法以減少標籤SNPs的數量。 第一個方法是對現有的演算法所找到的標籤SNPs做後製處理。第二個方法則是一開始找標籤SNPs的時候就考慮SNPs間的關連性。實驗結果顯示,兩種方法都可以減少標籤SNPs的數量而不會損害原有標籤SNPs所包含的資訊。Recent studies have shown that the patterns of Linkage Disequilibrium (LD) observed in the human population reveal a block-like structure. The entire chromosome can be partitioned into high LD regions interspersed by low LD regions. The high LD regions are usually called “haplotype blocks”. Within a haplotype block, there are only few haplotype patterns and only a small subset of SNPs (called tag SNPs) are sufficient to distinguish these patterns. To solve the problem of finding tag SNPs, we propose a hybrid method that combines the ideas of the branch-and-bound method and the greedy algorithm. This method explores larger solution space to obtain a better solution than a traditional greedy algorithm. It also allows the user to adjust the efficiency of the program and quality of solutions. This algorithm has been implemented and tested on a variety of simulated and biological data. The experimental results indicate that our program can find better solutions than previous methods. This approach is quite general since it can be used to adapt other greedy algorithms to solving their corresponding problems. In addition, we can reduce the number of tag SNPs even more by considering the extent of linkage disequilibrium in the human genome. We show that the extent of LD can be also used to boost the heavy computation of computation of pairwise LD by giving a faster algorithm. We propose two methods of which the first is a posterior approach based one existing algorithms and the second identifies tag SNPs by considering the correlation between SNPs from the beginning. The experimental results show that our methods can reduce the number of tags SNPs in comparison with previous methods and the efficiency is significantly improved.Title ii Tabel of Contents ii Acknowledgements iv Abstract in Chinese v Abstract vi 1 Introduction 1 2 A Greedier Approach for Finding Tag SNPs 4 2.1 The Problem of Minimizing the Number of Tag SNPs . . . . . . . . . . . 5 2.2 Greedy-Partition-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Improvement of Efficiency and Solutions . . . . . . . . . . . . . . . . . . 10 2.3.1 Improvement of Efficiency . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Improvement of Solutions . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Experimental Results on Simulated Data . . . . . . . . . . . . . . 11 2.4.2 Experimental Results on Biological Data . . . . . . . . . . . . . . 13 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5.1 Tradeoff between Efficiency and Solutions of GPT . . . . . . . . . 17 2.5.2 Comparison with MLR-Tagging . . . . . . . . . . . . . . . . . . . 19 3 Algorithms for Reducing the Number of Tag SNPs by the Perfect Proxy Set 21 3.1 Minimizing the Number of LD Bins . . . . . . . . . . . . . . . . . . . . . 22 3.2 The Comparison between the Block-Based Method and the LD-Based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Methods for Reducing the Number of Tag SNPs by the Perfect Proxy Set 26 3.3.1 The First Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.2 The Second Method . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Reducing the Number of Global Tag SNPs by Relaxing the Threshold . . 34 4 Conclusion 3
    corecore