4,702 research outputs found

    Boosting Haplotype Inference with Local Search

    No full text
    Abstract. A very challenging problem in the genetics domain is to infer haplotypes from genotypes. This process is expected to identify genes affecting health, disease and response to drugs. One of the approaches to haplotype inference aims to minimise the number of different haplotypes used, and is known as haplotype inference by pure parsimony (HIPP). The HIPP problem is computationally difficult, being NP-hard. Recently, a SAT-based method (SHIPs) has been proposed to solve the HIPP problem. This method iteratively considers an increasing number of haplotypes, starting from an initial lower bound. Hence, one important aspect of SHIPs is the lower bounding procedure, which reduces the number of iterations of the basic algorithm, and also indirectly simplifies the resulting SAT model. This paper describes the use of local search to improve existing lower bounding procedures. The new lower bounding procedure is guaranteed to be as tight as the existing procedures. In practice the new procedure is in most cases considerably tighter, allowing significant improvement of performance on challenging problem instances.

    Efficient Haplotype Inference with Pseudo-Boolean Optimization

    No full text
    Abstract. Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). Initial work in this area was based on a number of different Integer Linear Programming (ILP) models and branch and bound algorithms. Recent work has shown that the utilization of a Boolean Satisfiability (SAT) formulation and state of the art SAT solvers represents the most efficient approach for solving the HIPP problem. Motivated by the promising results obtained using SAT techniques, this paper investigates the utilization of modern Pseudo-Boolean Optimization (PBO) algorithms for solving the HIPP problem. The paper starts by applying PBO to existing ILP models. The results are promising, and motivate the development of a new PBO model (RPoly) for the HIPP problem, which has a compact representation and eliminates key symmetries. Experimental results indicate that RPoly outperforms the SAT-based approach on most problem instances, being, in general, significantly more efficient

    Pure Parsimony Xor Haplotyping

    Full text link
    The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given SNP. Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project

    Origin and genetic diversity of diploid parthenogenetic Artemia in Eurasia

    Get PDF
    There is wide interest in understanding how genetic diversity is generated and maintained in parthenogenetic lineages, as it will help clarify the debate of the evolution and maintenance of sexual reproduction. There are three mechanisms that can be responsible for the generation of genetic diversity of parthenogenetic lineages: contagious parthenogenesis, repeated hybridization and microorganism infections (e.g. Wolbachia). Brine shrimps of the genus Artemia (Crustacea, Branchiopoda, Anostraca) are a good model system to investigate evolutionary transitions between reproductive systems as they include sexual species and lineages of obligate parthenogenetic populations of different ploidy level, which often co-occur. Diploid parthenogenetic lineages produce occasional fully functional rare males, interspecific hybridization is known to occur, but the mechanisms of origin of asexual lineages are not completely understood. Here we sequenced and analysed fragments of one mitochondrial and two nuclear genes from an extensive set of populations of diploid parthenogenetic Artemia and sexual species from Central and East Asia to investigate the evolutionary origin of diploid parthenogenetic Artemia, and geographic origin of the parental taxa. Our results indicate that there are at least two, possibly three independent and recent maternal origins of parthenogenetic lineages, related to A. urmiana and Artemia sp. from Kazakhstan, but that the nuclear genes are very closely related in all the sexual species and parthenogegetic lineages except for A. sinica, who presumable took no part on the origin of diploid parthenogenetic strains. Our data cannot rule out either hybridization between any of the very closely related Asiatic sexual species or rare events of contagious parthenogenesis via rare males as the contributing mechanisms to the generation of genetic diversity in diploid parthenogenetic Artemia lineages

    Local Population Structure and Patterns of Western Hemisphere Dispersal for Coccidioides spp., the Fungal Cause of Valley Fever.

    Get PDF
    UnlabelledCoccidioidomycosis (or valley fever) is a fungal disease with high morbidity and mortality that affects tens of thousands of people each year. This infection is caused by two sibling species, Coccidioides immitis and C. posadasii, which are endemic to specific arid locales throughout the Western Hemisphere, particularly the desert southwest of the United States. Recent epidemiological and population genetic data suggest that the geographic range of coccidioidomycosis is expanding, as new endemic clusters have been identified in the state of Washington, well outside the established endemic range. The genetic mechanisms and epidemiological consequences of this expansion are unknown and require better understanding of the population structure and evolutionary history of these pathogens. Here we performed multiple phylogenetic inference and population genomics analyses of 68 new and 18 previously published genomes. The results provide evidence of substantial population structure in C. posadasii and demonstrate the presence of distinct geographic clades in central and southern Arizona as well as dispersed populations in Texas, Mexico, South America, and Central America. Although a smaller number of C. immitis strains were included in the analyses, some evidence of phylogeographic structure was also detected in this species, which has been historically limited to California and Baja, Mexico. Bayesian analyses indicated that C. posadasii is the more ancient of the two species and that Arizona contains the most diverse subpopulations. We propose a southern Arizona-northern Mexico origin for C. posadasii and describe a pathway for dispersal and distribution out of this region.ImportanceCoccidioidomycosis, or valley fever, is caused by the pathogenic fungi Coccidioides posadasii and C. immitis The fungal species and disease are primarily found in the American desert southwest, with spotted distribution throughout the Western Hemisphere. Initial molecular studies suggested a likely anthropogenic movement of C. posadasii from North America to South America. Here we comparatively analyze eighty-six genomes of the two Coccidioides species and establish local and species-wide population structures to not only clarify the earlier dispersal hypothesis but also provide evidence of likely ancestral populations and patterns of dispersal for the known subpopulations of C. posadasii
    • …
    corecore