498 research outputs found

    Linked region detection using high-density SNP genotype data via the minimum recombinant model of pedigree haplotype inference

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the rapid development of high-throughput genotyping technologies, efficient methods for identifying linked regions using high-density SNP genotype data have become more and more important. Recently, a deterministic method that works very well on SNP genotyping data has been developed (Lin et al. Bioinformatics 2008, 24(1): 86–93). However, that program can only work on a limited number of family structures. In particular, the results (if any) will be poor when the genotype data for the whole chromosome of one of the parents in a nuclear family is missing.</p> <p>Results</p> <p>We have developed a software package (LIden) for identifying linked regions using high-density SNP genotype data. We focus on handling the case where the genotype data for the whole chromosome of one of the parents in a nuclear family is missing. We use the minimum recombinant model for haplotype inference in pedigrees. Several local optimization algorithms are used to infer the haplotype of each individual and determine the linked regions based on the inferred haplotype data. We have developed a more flexible method to combine nuclear families to further refine (reduce the length of) the linked regions.</p> <p>Conclusion</p> <p>Our new package (LIden) is efficient software for linked region detection using high-density SNP genotype data. LIden can handle some important cases where the existing programs do not work well. In particular, the new package can handle many cases where the genotype data of one of the two parents is missing for the entire chromosome. The running time of the program is <it>O</it>(<it>mn</it>), where <it>m </it>is the number of members in the family and <it>n </it>is the number of SNP sites in the chromosome. LIden is specifically suitable for handling big sized families. This research also demonstrates another practical use of the minimum recombinant model for haplotype inference in pedigrees.</p> <p>The software package can be downloaded at <url>http://www.cs.cityu.edu.hk/~lwang/software/Link</url>.</p

    Most parsimonious haplotype allele sharing determination

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The "common disease – common variant" hypothesis and genome-wide association studies have achieved numerous successes in the last three years, particularly in genetic mapping in human diseases. Nevertheless, the power of the association study methods are still low, in particular on quantitative traits, and the description of the full allelic spectrum is deemed still far from reach. Given increasing density of single nucleotide polymorphisms available and suggested by the block-like structure of the human genome, a popular and prosperous strategy is to use haplotypes to try to capture the correlation structure of SNPs in regions of little recombination. The key to the success of this strategy is thus the ability to unambiguously determine the haplotype allele sharing status among the members. The association studies based on haplotype sharing status would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants.</p> <p>Results</p> <p>For pedigree genotype datasets of medium density of SNPs, we present two methods for haplotype allele sharing status determination among the pedigree members. Extensive simulation study showed that both methods performed nearly perfectly on breakpoint discovery, mutation haplotype allele discovery, and shared chromosomal region discovery.</p> <p>Conclusion</p> <p>For pedigree genotype datasets, the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees. Given their excellent performance, the presented haplotype allele sharing status determination programs can be useful in many downstream applications including haplotype based association studies.</p

    Genetic Characterization of the Pee Dee Cotton Breeding Program

    Get PDF
    The history of cotton breeding in the southeastern United States is multifaceted and complex. Public and private breeding programs have driven cotton’s genetic development over the past two centuries. The Pee Dee breeding program in Florence, South Carolina, has had a substantial role in the development of well-adapted cotton cultivars with improved fiber strength, fiber length, and performance in farmers’ fields. Despite the historic importance of the cotton germplasm lines and varieties from the Pee Dee program, little has been done to characterize the population structure and genetic architecture of key traits in this closed breeding program. Here, I first provide an in-depth exploration of the rich history of cotton breeding and genetics over the past century to provide some context for the remainder of this thesis. Then, I discuss the interface of breeding goals, population genetics, and historical implications of a representative sample across 85+ years of cotton breeding in the Pee Dee program. Once the family structure had been evaluated, I applied modern statistical methodology to find gene haplotypes that are associated with improved fiber quality or field performance and attempted to trace the origin of some beneficial alleles. Lastly, I talk about the implications of our work and how it may influence future breeding efforts to utilize the germplasm from this diverse cotton collection

    A comparison of SNPs and microsatellites as linkage mapping markers: lessons from the zebra finch (Taeniopygia guttata)

    Get PDF
    Background: Genetic linkage maps are essential tools when searching for quantitative trait loci (QTL). To maximize genome coverage and provide an evenly spaced marker distribution a combination of different types of genetic marker are sometimes used. In this study we created linkage maps of four zebra finch (Taeniopygia guttata) chromosomes (1, 1A, 2 and 9) using two types of marker, Single Nucleotide Polymorphisms (SNPs) and microsatellites. To assess the effectiveness and accuracy of each kind of marker we compared maps built with each marker type separately and with both types of marker combined. Linkage map marker order was validated by making comparisons to the assembled zebra finch genome sequence. Results: We showed that marker order was less reliable and linkage map lengths were inflated for microsatellite maps relative to SNP maps, apparently due to differing error rates between the two types of marker. Guidelines on how to minimise the effects of error are provided. In particular, we show that when combining both types of marker the conventional process of building linkage maps, whereby the most informative markers are added to the map first, has to be modified in order to improve map accuracy. Conclusions: When using multiple types and large numbers of markers to create dense linkage maps, the least error prone loci (SNPs) rather than the most informative should be used to create framework maps before the addition of other potentially more error prone markers (microsatellites). This raises questions about the accuracy of marker order and predicted recombination rates in previous microsatellite linkage maps which were created using the conventional building process, however, provided suitable error detection strategies are followed microsatellite-based maps can continue to be regarded as reasonably reliable

    An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

    Get PDF

    Accurate HLA type inference using a weighted similarity graph

    Get PDF
    Abstract Background The human leukocyte antigen system (HLA) contains many highly variable genes. HLA genes play an important role in the human immune system, and HLA gene matching is crucial for the success of human organ transplantations. Numerous studies have demonstrated that variation in HLA genes is associated with many autoimmune, inflammatory and infectious diseases. However, typing HLA genes by serology or PCR is time consuming and expensive, which limits large-scale studies involving HLA genes. Since it is much easier and cheaper to obtain single nucleotide polymorphism (SNP) genotype data, accurate computational algorithms to infer HLA gene types from SNP genotype data are in need. To infer HLA types from SNP genotypes, the first step is to infer SNP haplotypes from genotypes. However, for the same SNP genotype data set, the haplotype configurations inferred by different methods are usually inconsistent, and it is often difficult to decide which one is true. Results In this paper, we design an accurate HLA gene type inference algorithm by utilizing SNP genotype data from pedigrees, known HLA gene types of some individuals and the relationship between inferred SNP haplotypes and HLA gene types. Given a set of haplotypes inferred from the genotypes of a population consisting of many pedigrees, the algorithm first constructs a weighted similarity graph based on a new haplotype similarity measure and derives constraint edges from known HLA gene types. Based on the principle that different HLA gene alleles should have different background haplotypes, the algorithm searches for an optimal labeling of all the haplotypes with unknown HLA gene types such that the total weight among the same HLA gene types is maximized. To deal with ambiguous haplotype solutions, we use a genetic algorithm to select haplotype configurations that tend to maximize the same optimization criterion. Our experiments on a previously typed subset of the HapMap data show that the algorithm is highly accurate, achieving an accuracy of 96% for gene HLA-A, 95% for HLA-B, 97% for HLA-C, 84% for HLA-DRB1, 98% for HLA-DQA1 and 97% for HLA-DQB1 in a leave-one-out test. Conclusions Our algorithm can infer HLA gene types from neighboring SNP genotype data accurately. Compared with a recent approach on the same input data, our algorithm achieved a higher accuracy. The code of our algorithm is available to the public for free upon request to the corresponding authors

    Parsimony-based genetic algorithm for haplotype resolution and block partitioning

    Get PDF
    This dissertation proposes a new algorithm for performing simultaneous haplotype resolution and block partitioning. The algorithm is based on genetic algorithm approach and the parsimonious principle. The multiloculs LD measure (Normalized Entropy Difference) is used as a block identification criterion. The proposed algorithm incorporates missing data is a part of the model and allows blocks of arbitrary length. In addition, the algorithm provides scores for the block boundaries which represent measures of strength of the boundaries at specific positions. The performance of the proposed algorithm was validated by running it on several publicly available data sets including the HapMap data and comparing results to those of the existing state-of-the-art algorithms. The results show that the proposed genetic algorithm provides the accuracy of haplotype decomposition within the range of the same indicators shown by the other algorithms. The block structure output by our algorithm in general agrees with the block structure for the same data provided by the other algorithms. Thus, the proposed algorithm can be successfully used for block partitioning and haplotype phasing while providing some new valuable features like scores for block boundaries and fully incorporated treatment of missing data. In addition, the proposed algorithm for haplotyping and block partitioning is used in development of the new clustering algorithm for two-population mixed genotype samples. The proposed clustering algorithm extracts from the given genotype sample two clusters with substantially different block structures and finds haplotype resolution and block partitioning for each cluster

    Fast and Accurate Haplotype Inference with Hidden Markov Model

    Get PDF
    The genome of human and other diploid organisms consists of paired chromosomes. The haplotype information (DNA constellation on one single chromosome), which is crucial for disease association analysis and population genetic inference among many others, is however hidden in the data generated for diploid organisms (including human) by modern high-throughput technologies which cannot distinguish information from two homologous chromosomes. Here, I consider the haplotype inference problem in two common scenarios of genetic studies: 1. Model organisms (such as laboratory mice): Individuals are bred through prescribed pedigree design. 2. Out-bred organisms (such as human): Individuals (mostly unrelated) are drawn from one or more populations or continental groups. In the two scenarios, one individual may share short blocks of chromosomes with other individual(s) or with founder(s) if available. I have developed and implemented methods, by identifying the shared blocks statistically, to accurately and more rapidly reconstruct the haplotypes for individuals under study and to solve important related problems including genotype imputation and ancestry inference. My methods, based on hidden Markov model, can scale up to tens of thousands of individuals. Analysis based on my method leads to a new genetic map in mouse population which reveals important biological properties of the recombination process. I have also explored the study design and empirical quality control for imputation tasks with large scale datasets from admixed population.Doctor of Philosoph
    corecore