542 research outputs found

    Approximating Clustering of Fingerprint Vectors with Missing Values

    Full text link
    The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant.Comment: 13 pages, 4 figure

    Origin of rat β-globin haplotypes containing three and five genes

    Get PDF
    We have reported in rat three adult β-gene haplotypes containing either five or three genes. Detailed sequence analysis reveals that the leftmost gene is the major gene and that at the opposite end downstream lies the minor gene. All of the genes lying between them are minor-major hybrids indicating their origin by unequal crossing-over. In two haplotypes β-globin genes were found with an L1 element inserted directly into IVS2. The described results allow the formulation of a pathway of mutational events leading from the ancient two-β-gene rodent ancestor through a three-gene haplotype to five-gene haplotypes, one of which is postulated to have arisen in common laboratory strains since their capture in the wild.[https://academic.oup.com/mbe/article/7/5/407/1061225

    CoolMPS for robust sequencing of single-nuclear RNAs captured by droplet-based method

    Get PDF
    Massively-parallel single-cell and single-nucleus RNA sequencing (scRNA-seq, snRNA-seq) requires extensive sequencing to achieve proper per-cell coverage, making sequencing resources and availability of sequencers critical factors for conducting deep transcriptional profiling. CoolMPS is a novel sequencing-by-synthesis approach that relies on nucleotide labeling by re-usable antibodies, but whether it is applicable to snRNA-seq has not been tested. Here, we use a low-cost and off-the-shelf protocol to chemically convert libraries generated with the widely-used Chromium 10X technology to be sequenceable with CoolMPS technology. To assess the quality and performance of converted libraries sequenced with CoolMPS, we generated a snRNA-seq dataset from the hippocampus of young and old mice. Native libraries were sequenced on an Illumina Novaseq and libraries that were converted to be compatible with CoolMPS were sequenced on a DNBSEQ-400RS. CoolMPS-derived data faithfully replicated key characteristics of the native library dataset, including correct estimation of ambient RNA-contamination, detection of captured cells, cell clustering results, spatial marker gene expression, inter- and intra-replicate differences and gene expression changes during aging. In conclusion, our results show that CoolMPS provides a viable alternative to standard sequencing of RNA from droplet-based libraries

    Significant abundance of cis configurations of coding variants in diploid human genomes

    Get PDF
    To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of ∼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function

    CoolMPS: evaluation of antibody labeling based massively parallel non-coding RNA sequencing

    Get PDF
    Results of massive parallel sequencing-by-synthesis vary depending on the sequencing approach. CoolMPS™ is a new sequencing chemistry that incorporates bases by labeled antibodies. To evaluate the performance, we sequenced 240 human non-coding RNA samples (dementia patients and controls) with and without CoolMPS. The Q30 value as indicator of the per base sequencing quality increased from 91.8 to 94%. The higher quality was reached across the whole read length. Likewise, the percentage of reads mapping to the human genome increased from 84.9 to 86.2%. For both technologies, we computed similar distributions between different RNA classes (miRNA, piRNA, tRNA, snoRNA and yRNA) and within the classes. While standard sequencing-by-synthesis allowed to recover more annotated miRNAs, CoolMPS yielded more novel miRNAs. The correlation between the two methods was 0.97. Evaluating the diagnostic performance, we observed lower minimal P-values for CoolMPS (adjusted P-value of 0.0006 versus 0.0004) and larger effect sizes (Cohen's d of 0.878 versus 0.9). Validating 19 miRNAs resulted in a correlation of 0.852 between CoolMPS and reverse transcriptase-quantitative polymerase chain reaction. Comparison to data generated with Illumina technology confirmed a known shift in the overall RNA composition. With CoolMPS we evaluated a novel sequencing-by-synthesis technology showing high performance for the analysis of non-coding RNAs

    Untreated PKU patients without intellectual disability: SHANK gene family as a candidate modifier

    Get PDF
    Phenylketonuria (PKU) is an inborn error of metabolism caused by variants in the phenylalanine hydroxylase (PAH) gene and it is characterized by excessively high levels of phenylalanine in body fluids. PKU is a paradigm for a genetic disease that can be treated and majority of developed countries have a population-based newborn screening. Thus, the combination of early diagnosis and immediate initiation of treatment has resulted in normal intelligence for treated PKU patients. Although PKU is a monogenic disease, decades of research and clinical practice have shown that the correlation between the genotype and corresponding phenotype is not simple at all. Attempts have been made to discover modifier genes for PKU cognitive phenotype but without any success so far. We conducted whole genome sequencing of 4 subjects from unrelated non-consanguineous families who presented with pathogenic mutations in the PAH gene, high blood phenylalanine concentrations and near-normal cognitive development despite no treatment. We used cross sample analysis to select genes common for more than one patient. Thus, the SHANK gene family emerged as the only relevant gene family with variants detected in 3 of 4 analyzed patients. We detected two novel variants, p.Pro1591Ala in SHANK1 and p.Asp18Asn in SHANK2, as well as SHANK2:p.Gly46Ser, SHANK2:p.Pro1388_Phe1389insLeuPro and SHANK3:p.Pro1716Thr variants that were previously described. Computational analysis indicated that the identified variants do not abolish the function of SHANK proteins. However, changes in posttranslational modifications of SHANK proteins could influence functioning of the glutamatergic synapses, cytoskeleton regulation and contribute to maintaining optimal synaptic density and number of dendritic spines. Our findings are linking SHANK gene family and brain plasticity in PKU for the first time. We hypothesize that variant SHANK proteins maintain optimal synaptic density and number of dendritic spines under high concentrations of phenylalanine and could have protective modifying effect on cognitive development of PKU patients

    Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations

    Get PDF
    The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect

    Sequencing by Hybridization of Long Targets

    Get PDF
    Sequencing by Hybridization (SBH) reconstructs an n-long target DNA sequence from its biochemically determined l-long subsequences. In the standard approach, the length of a uniformly random sequence that can be unambiguously reconstructed is limited to due to repetitive subsequences causing reconstruction degeneracies. We present a modified sequencing method that overcomes this limitation without the need for different types of biochemical assays and is robust to error