Search CORE

22 research outputs found

A Column Generation Approach for Pure Parsimony Haplotyping

Author: Dal Sasso Veronica
De Giovanni Luigi
Publication venue: OASIcs - OpenAccess Series in Informatics. 5th Student Conference on Operational Research (SCOR 2016)
Publication date: 01/01/2016
Field of study

Dagstuhl Research Online Publication Server

Disease progression in Plasmodium knowlesi malaria is linked to variation in invasion gene family members.

Emerging pathogens undermine initiatives to control the global health impact of infectious diseases. Zoonotic malaria is no exception. Plasmodium knowlesi, a malaria parasite of Southeast Asian macaques, has entered the human population. P. knowlesi, like Plasmodium falciparum, can reach high parasitaemia in human infections, and the World Health Organization guidelines for severe malaria list hyperparasitaemia among the measures of severe malaria in both infections. Not all patients with P. knowlesi infections develop hyperparasitaemia, and it is important to determine why. Between isolate variability in erythrocyte invasion, efficiency seems key. Here we investigate the idea that particular alleles of two P. knowlesi erythrocyte invasion genes, P. knowlesi normocyte binding protein Pknbpxa and Pknbpxb, influence parasitaemia and human disease progression. Pknbpxa and Pknbpxb reference DNA sequences were generated from five geographically and temporally distinct P. knowlesi patient isolates. Polymorphic regions of each gene (approximately 800 bp) were identified by haplotyping 147 patient isolates at each locus. Parasitaemia in the study cohort was associated with markers of disease severity including liver and renal dysfunction, haemoglobin, platelets and lactate, (r = ≥ 0.34, p = <0.0001 for all). Seventy-five and 51 Pknbpxa and Pknbpxb haplotypes were resolved in 138 (94%) and 134 (92%) patient isolates respectively. The haplotypes formed twelve Pknbpxa and two Pknbpxb allelic groups. Patients infected with parasites with particular Pknbpxa and Pknbpxb alleles within the groups had significantly higher parasitaemia and other markers of disease severity. Our study strongly suggests that P. knowlesi invasion gene variants contribute to parasite virulence. We focused on two invasion genes, and we anticipate that additional virulent loci will be identified in pathogen genome-wide studies. The multiple sustained entries of this diverse pathogen into the human population must give cause for concern to malaria elimination strategists in the Southeast Asian region

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Publikationsserver der Universität Tübingen

PubMed Central

Unimas Institutional Repository

Enlighten

St George's Online Research Archive

University of St. Andrews - Pure

St Andrews Research Repository

FigShare

An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

Author: Saeed Qamar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

Recommended from our members

Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA

Author: Anastassiou Dimitris
Iliadis Alexandros
Jajamovich Guido H.
Wang Xiaodong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Background: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight. Results: We developed a method for maximum-parsimony haplotype frequency estimation from pooled DNA data based on the sparse representation of the DNA pools in a dictionary of haplotypes. Extensions to scenarios where data is noisy or even missing are also presented. The resulting method is first applied to simulated data based on the haplotypes and their associated frequencies of the AGT gene. We further evaluate our methodology on datasets consisting of SNPs from the first 7Mb of the HapMap CEU population. Noise and missing data were further introduced in the datasets in order to test the extensions of the proposed method. Both HIPPO and HAPLOPOOL were also applied to these datasets to compare performances. Conclusions: We evaluate our methodology on scenarios where pooling is more efficient relative to individual genotyping; that is, in datasets that contain pools with a small number of individuals. We show that in such scenarios our methodology outperforms state-of-the-art methods such as HIPPO and HAPLOPOOL

Columbia University Academic Commons

Springer - Publisher Connector

PubMed Central

Algorithms For Haplotype Inference And Block Partitioning

Author: Vijaya Satya Ravi
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2006
Field of study

The completion of the human genome project in 2003 paved the way for studies to better understand and catalog variation in the human genome. The International HapMap Project was started in 2002 with the aim of identifying genetic variation in the human genome and studying the distribution of genetic variation across populations of individuals. The information collected by the HapMap project will enable researchers in associating genetic variations with phenotypic variations. Single Nucleotide Polymorphisms (SNPs) are loci in the genome where two individuals differ in a single base. It is estimated that there are approximately ten million SNPs in the human genome. These ten million SNPS are not completely independent of each other - blocks (contiguous regions) of neighboring SNPs on the same chromosome are inherited together. The pattern of SNPs on a block of the chromosome is called a haplotype. Each block might contain a large number of SNPs, but a small subset of these SNPs are sufficient to uniquely dentify each haplotype in the block. The haplotype map or HapMap is a map of these haplotype blocks. Haplotypes, rather than individual SNP alleles are expected to effect a disease phenotype. The human genome is diploid, meaning that in each cell there are two copies of each chromosome - i.e., each individual has two haplotypes in any region of the chromosome. With the current technology, the cost associated with empirically collecting haplotype data is prohibitively expensive. Therefore, the un-ordered bi-allelic genotype data is collected experimentally. The genotype data gives the two alleles in each SNP locus in an individual, but does not give information about which allele is on which copy of the chromosome. This necessitates computational techniques for inferring haplotypes from genotype data. This computational problem is called the haplotype inference problem. Many statistical approaches have been developed for the haplotype inference problem. Some of these statistical methods have been shown to be reasonably accurate on real genotype data. However, these techniques are very computation-intensive. With the international HapMap project collecting information from nearly 10 million SNPs, and with association studies involving thousands of individuals being undertaken, there is a need for more efficient methods for haplotype inference. This dissertation is an effort to develop efficient perfect phylogeny based combinatorial algorithms for haplotype inference. The perfect phylogeny haplotyping (PPH) problem is to derive a set of haplotypes for a given set of genotypes with the condition that the haplotypes describe a perfect phylogeny. The perfect phylogeny approach to haplotype inference is applicable to the human genome due to the block structure of the human genome. An important contribution of this dissertation is an optimal O(nm) time algorithm for the PPH problem, where n is the number of genotypes and m is the number of SNPs involved. The complexity of the earlier algorithms for this problem was O(nm^2). The O(nm) complexity was achieved by applying some transformations on the input data and by making use of the FlexTree data structure that has been developed as part of this dissertation work, which represents all the possible PPH solution for a given set of genotypes. Real genotype data does not always admit a perfect phylogeny, even within a block of the human genome. Therefore, it is necessary to extend the perfect phylogeny approach to accommodate deviations from perfect phylogeny. Deviations from perfect phylogeny might occur because of recombination events and repeated or back mutations (also referred to as homoplasy events). Another contribution of this dissertation is a set of fixed-parameter tractable algorithms for constructing near-perfect phylogenies with homoplasy events. For the problem of constructing a near perfect phylogeny with q homoplasy events, the algorithm presented here takes O(nm^2+m^(n+m)) time. Empirical analysis on simulated data shows that this algorithm produces more accurate results than PHASE (a popular haplotype inference program), while being approximately 1000 times faster than phase. Another important problem while dealing real genotype or haplotype data is the presence of missing entries. The Incomplete Perfect Phylogeny (IPP) problem is to construct a perfect phylogeny on a set of haplotypes with missing entries. The Incomplete Perfect Phylogeny Haplotyping (IPPH) problem is to construct a perfect phylogeny on a set of genotypes with missing entries. Both the IPP and IPPH problems have been shown to be NP-hard. The earlier approaches for both of these problems dealt with restricted versions of the problem, where the root is either available or can be trivially re-constructed from the data, or certain assumptions were made about the data. We make some novel observations about these problems, and present efficient algorithms for unrestricted versions of these problems. The algorithms have worst-case exponential time complexity, but have been shown to be very fast on practical instances of the problem

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Mapping the breast cancer metastatic cascade onto ctDNA using genetic and epigenetic clonal tracking.

Author: Ashworth A
Barry P
Cresswell GD
Heide T
Magnani L
Maley CC
Nichol D
Schiavon G
Sottoriva A
Spiteri I
Tari H
Zapata L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/02/2020
Field of study

Circulating tumour DNA (ctDNA) allows tracking of the evolution of human cancers at high resolution, overcoming many limitations of tissue biopsies. However, exploiting ctDNA to determine how a patient's cancer is evolving in order to aid clinical decisions remains difficult. This is because ctDNA is a mix of fragmented alleles, and the contribution of different cancer deposits to ctDNA is largely unknown. Profiling ctDNA almost invariably requires prior knowledge of what genomic alterations to track. Here, we leverage on a rapid autopsy programme to demonstrate that unbiased genomic characterisation of several metastatic sites and concomitant ctDNA profiling at whole-genome resolution reveals the extent to which ctDNA is representative of widespread disease. We also present a methylation profiling method that allows tracking evolutionary changes in ctDNA at single-molecule resolution without prior knowledge. These results have critical implications for the use of liquid biopsies to monitor cancer evolution in humans and guide treatment

Spiral - Imperial College Digital Repository

Institute of Cancer Research Repository

Methods for Viral Intra-Host and Inter-Host Data Analysis for Next-Generation Sequencing Technologies

Author: Knyazev Sergey
Publication venue: ScholarWorks @ Georgia State University
Publication date: 10/08/2021
Field of study

The deep coverage offered by next-generation sequencing (NGS) technology has facilitated the reconstruction of intra-host RNA viral populations at an unprecedented level of detail. However, NGS data requires sophisticated analysis dealing with millions of error-prone short reads. This dissertation will first review the challenges and methods for viral NGS genomic data analysis in the NGS era. Second, it presents a software tool CliqueSNV for inferring viral quasispecies based on extracting pairs of statistically linked mutations from noisy reads, which effectively reduces sequencing noise and enables identifying minority haplotypes with a frequency below the sequencing error rate. Finally, the dissertation describes algorithms VOICE and MinDistB for inference of relatedness between viral samples, identification of transmission clusters, and sources of infection

ScholarWorks @ Georgia State University