290 research outputs found

    Approximating Clustering of Fingerprint Vectors with Missing Values

    Full text link
    The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant.Comment: 13 pages, 4 figure

    Origin of rat β-globin haplotypes containing three and five genes

    Get PDF
    We have reported in rat three adult β-gene haplotypes containing either five or three genes. Detailed sequence analysis reveals that the leftmost gene is the major gene and that at the opposite end downstream lies the minor gene. All of the genes lying between them are minor-major hybrids indicating their origin by unequal crossing-over. In two haplotypes β-globin genes were found with an L1 element inserted directly into IVS2. The described results allow the formulation of a pathway of mutational events leading from the ancient two-β-gene rodent ancestor through a three-gene haplotype to five-gene haplotypes, one of which is postulated to have arisen in common laboratory strains since their capture in the wild.[https://academic.oup.com/mbe/article/7/5/407/1061225

    Significant abundance of cis configurations of coding variants in diploid human genomes

    Get PDF
    To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of ∼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function

    Untreated PKU patients without intellectual disability: SHANK gene family as a candidate modifier

    Get PDF
    Phenylketonuria (PKU) is an inborn error of metabolism caused by variants in the phenylalanine hydroxylase (PAH) gene and it is characterized by excessively high levels of phenylalanine in body fluids. PKU is a paradigm for a genetic disease that can be treated and majority of developed countries have a population-based newborn screening. Thus, the combination of early diagnosis and immediate initiation of treatment has resulted in normal intelligence for treated PKU patients. Although PKU is a monogenic disease, decades of research and clinical practice have shown that the correlation between the genotype and corresponding phenotype is not simple at all. Attempts have been made to discover modifier genes for PKU cognitive phenotype but without any success so far. We conducted whole genome sequencing of 4 subjects from unrelated non-consanguineous families who presented with pathogenic mutations in the PAH gene, high blood phenylalanine concentrations and near-normal cognitive development despite no treatment. We used cross sample analysis to select genes common for more than one patient. Thus, the SHANK gene family emerged as the only relevant gene family with variants detected in 3 of 4 analyzed patients. We detected two novel variants, p.Pro1591Ala in SHANK1 and p.Asp18Asn in SHANK2, as well as SHANK2:p.Gly46Ser, SHANK2:p.Pro1388_Phe1389insLeuPro and SHANK3:p.Pro1716Thr variants that were previously described. Computational analysis indicated that the identified variants do not abolish the function of SHANK proteins. However, changes in posttranslational modifications of SHANK proteins could influence functioning of the glutamatergic synapses, cytoskeleton regulation and contribute to maintaining optimal synaptic density and number of dendritic spines. Our findings are linking SHANK gene family and brain plasticity in PKU for the first time. We hypothesize that variant SHANK proteins maintain optimal synaptic density and number of dendritic spines under high concentrations of phenylalanine and could have protective modifying effect on cognitive development of PKU patients

    Sequencing by Hybridization of Long Targets

    Get PDF
    Sequencing by Hybridization (SBH) reconstructs an n-long target DNA sequence from its biochemically determined l-long subsequences. In the standard approach, the length of a uniformly random sequence that can be unambiguously reconstructed is limited to due to repetitive subsequences causing reconstruction degeneracies. We present a modified sequencing method that overcomes this limitation without the need for different types of biochemical assays and is robust to error

    Identification of FVIII gene mutations in patients with hemophilia A using new combinatorial sequencing by hybridization

    Get PDF
    Background: Standard methods of mutation detection are time consuming in Hemophilia A (HA) rendering their application unavailable in some analysis such as prenatal diagnosis. Objectives: To evaluate the feasibility of combinatorial sequencing-by-hybridization (cSBH) as an alternative and reliable tool for mutation detection in FVIII gene. Patients/Methods: We have applied a new method of cSBH that uses two different colors for detection of multiple point mutations in the FVIII gene. The 26 exons encompassing the HA gene were analyzed in 7 newly diagnosed Italian patients and in 19 previously characterized individuals with FVIII deficiency. Results: Data show that, when solution-phase TAMRA and QUASAR labeled 5-mer oligonucleotide sets mixed with unlabeled target PCR templates are co-hybridized in the presence of DNA ligase to universal 6-mer oligonucleotide probe-based arrays, a number of mutations can be successfully detected. The technique was reliable also in identifying a mutant FVIII allele in an obligate heterozygote. A novel missense mutation (Leu1843Thr) in exon 16 and three novel neutral polymorphisms are presented with an updated protocol for 2-color cSBH. Conclusions: cSBH is a reliable tool for mutation detection in FVIII gene and may represent a complementary method for the genetic screening of HA patients

    Identification of cancer predisposition variants in apparently healthy individuals using a next-generation sequencing-based family genomics approach

    Get PDF
    Cancer, like many common disorders, has a complex etiology, often with a strong genetic component and with multiple environmental factors contributing to susceptibility. A considerable number of genomic variants have been previously reported to be causative of, or associated with, an increased risk for various types of cancer. Here, we adopted a next-generation sequencing approach in 11 members of two families of Greek descent to identify all genomic variants with the potential to predispose family members to cancer. Cross-comparison with data from the Human Gene Mutation Database identified a total of 571 variants, from which 47 % were disease-associated polymorphisms, 26 % disease-associated polymorphisms with additional supporting functional evidence, 19 % functional polymorphisms with in vitro/laboratory or in vivo supporting evidence but no known disease association, 4 % putative disease-causing mutations but with some residual doubt as to their pathological significance, and 3 % disease-causing mutations. Subsequent analysis, focused on the latter variant class most likely to be involved in cancer predisposition, revealed two variants of prime interest, namely MSH2 c.2732T>A (p.L911R) and BRCA1 c.2955delC, the first of which is novel. KMT2D c.13895delC and c.1940C>A variants are additionally reported as incidental findings. The next-generation sequencing-based family genomics approach described herein has the potential to be applied to other types of complex genetic disorder in order to identify variants of potential pathological significance

    Detection and phasing of single base de novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing

    Get PDF
    Currently, the methods available for preimplantation genetic diagnosis (PGD) of in vitro fertilized (IVF) embryos do not detect de novo single-nucleotide and short indel mutations, which have been shown to cause a large fraction of genetic diseases. Detection of all these types of mutations requires whole-genome sequencing (WGS). In this study, advanced massively parallel WGS was performed on three 5- to 10-cell biopsies from two blastocyst-stage embryos. Both parents and paternal grandparents were also analyzed to allow for accurate measurements of false-positive and false-negative error rates. Overall, >95% of each genome was called. In the embryos, experimentally derived haplotypes and barcoded read data were used to detect and phase up to 82% of de novo single base mutations with a false-positive rate of about one error per Gb, resulting in fewer than 10 such errors per embryo. This represents a ∼ 100-fold lower error rate than previously published from 10 cells, and it is the first demonstration that advanced WGS can be used to accurately identify these de novo mutations in spite of the thousands of false-positive errors introduced by the extensive DNA amplification required for deep sequencing. Using haplotype information, we also demonstrate how small de novo deletions could be detected. These results suggest that phased WGS using barcoded DNA could be used in the future as part of the PGD process to maximize comprehensiveness in detecting disease-causing mutations and to reduce the incidence of genetic diseases.Brock A. Peters, Bahram G. Kermani, Oleg Alferov, Misha R. Agarwal, Mark A. McElwain, Natali Gulbahce, Daniel M. Hayden, Y. Tom Tang, Rebecca Yu Zhang, Rick Tearle, Birgit Crain, Renata Prates, Alan Berkeley, Santiago Munné and Radoje Drmana
    corecore