8,248 research outputs found

    Locating regions in a sequence under density constraints

    Get PDF
    Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to appear in SIAM Journal on Computin

    Identification of presumed pathogenic KRT3 and KRT12 gene mutations associated with Meesmann corneal dystrophy.

    Get PDF
    PurposeTo report potentially pathogenic mutations in the keratin 3 (KRT3) and keratin 12 (KRT12) genes in two individuals with clinically diagnosed Meesmann corneal dystrophy (MECD).MethodsSlit-lamp examination was performed on the probands and available family members to identify characteristic features of MECD. After informed consent was obtained, saliva samples were obtained as a source of genomic DNA, and screening of KRT3 and KRT12 was performed. Potentially pathogenic variants were screened for in 200 control chromosomes. PolyPhen-2, SIFT, and PANTHER were used to predict the functional impact of identified variants. Short tandem repeat genotyping was performed to confirm paternity.ResultsSlit-lamp examination of the first proband demonstrated bilateral, diffusely distributed, clear epithelial microcysts, consistent with MECD. Screening of KRT3 revealed a heterozygous missense variant in exon 1, c.250C>T (p.(Arg84Trp)), which has a minor allele frequency of 0.0076 and was not identified in 200 control chromosomes. In silico analysis with PolyPhen-2 and PANTHER predicted the variant to be damaging to protein function; however, SIFT analysis predicted tolerance of the variant. The second proband demonstrated bilateral, diffusely distributed epithelial opacities that appeared gray-white on direct illumination and translucent on retroillumination. Neither parent demonstrated corneal opacities. Screening of KRT12 revealed a novel heterozygous insertion/deletion variant in exon 6, c.1288_1293delinsAGCCCT (p.(Arg430_Arg431delinsSerPro)). This variant was not present in either of the proband's parents or in 200 control chromosomes and was predicted to be damaging by PolyPhen-2, PANTHER, and SIFT. Haplotype analysis confirmed paternity of the second proband, indicating that the variant arose de novo.ConclusionsWe present a novel KRT12 mutation, representing the first de novo mutation and the first indel in KRT12 associated with MECD. In addition, we report a variant of uncertain significance in KRT3 in an individual with MECD. Although the potential pathogenicity of this variant is unknown, it is the first variant affecting the head domain of K3 to be reported in an individual with MECD and suggests that disease-causing variants associated with MECD may not be restricted to primary sequence alterations of either the helix-initiation or helix-termination motifs of K3 and K12

    Searching a bitstream in linear time for the longest substring of any given density

    Full text link
    Given an arbitrary bitstream, we consider the problem of finding the longest substring whose ratio of ones to zeroes equals a given value. The central result of this paper is an algorithm that solves this problem in linear time. The method involves (i) reformulating the problem as a constrained walk through a sparse matrix, and then (ii) developing a data structure for this sparse matrix that allows us to perform each step of the walk in amortised constant time. We also give a linear time algorithm to find the longest substring whose ratio of ones to zeroes is bounded below by a given value. Both problems have practical relevance to cryptography and bioinformatics.Comment: 22 pages, 19 figures; v2: minor edits and enhancement

    Genomic Expansion of Magnetotactic Bacteria Reveals an Early Common Origin of Magnetotaxis with Lineage-specific Evolution

    Get PDF
    The origin and evolution of magnetoreception, which in diverse prokaryotes and protozoa is known as magnetotaxis and enables these microorganisms to detect Earth’s magnetic field for orientation and navigation, is not well understood in evolutionary biology. The only known prokaryotes capable of sensing the geomagnetic field are magnetotactic bacteria (MTB), motile microorganisms that biomineralize intracellular, membrane-bounded magnetic single-domain crystals of either magnetite (Fe3O4) or greigite (Fe3S4) called magnetosomes. Magnetosomes are responsible for magnetotaxis in MTB. Here we report the first large-scale metagenomic survey of MTB from both northern and southern hemispheres combined with 28 genomes from uncultivated MTB. These genomes expand greatly the coverage of MTB in the Proteobacteria, Nitrospirae, and Omnitrophica phyla, and provide the first genomic evidence of MTB belonging to the Zetaproteobacteria and “Candidatus Lambdaproteobacteria” classes. The gene content and organization of magnetosome gene clusters, which are physically grouped genes that encode proteins for magnetosome biosynthesis and organization, are more conserved within phylogenetically similar groups than between different taxonomic lineages. Moreover, the phylogenies of core magnetosome proteins form monophyletic clades. Together, these results suggest a common ancient origin of iron-based (Fe3O4 and Fe3S4) magnetotaxis in the domain Bacteria that underwent lineage-specific evolution, shedding new light on the origin and evolution of biomineralization and magnetotaxis, and expanding significantly the phylogenomic representation of MTB

    High-Mass X-ray Binaries and the Spiral Structure of the Host Galaxy

    Full text link
    We investigate the manifestation of the spiral structure in the distribution of high-mass X-ray binaries (HMXBs) over the host galaxy. We construct the simple kinematic model. It shows that the HMXBs should be displaced relative to the spiral structure observed in such traditional star formation rate indicators as the Halpha and FIR emissions because of their finite lifetimes. Using Chandra observations of M51, we have studied the distribution of X-ray sources relative to the spiral arms of this galaxy observed in Halpha. Based on K-band data and background source number counts, we have separated the contributions from high-mass and low-mass X-ray binaries and active galactic nuclei. In agreement with model predictions, the distribution of HMXBs is wider than that of bright HII regions concentrated in the region of ongoing star formation. However, the statistical significance of this result is low, as is the significance of the concentration of the total population of X-ray sources to the spiral arms. We also predict the distribution of HMXBs in our Galaxy in Galactic longitude. The distribution depends on the mean HMXB age and can differ significantly from the distributions of such young objects as ultracompact HII regions.Comment: 18 pages, 6 figures; Astronomy Letters, Vol. 33, No. 5, 2007, pp. 299-30

    GenomeVIP: A cloud platform for genomic variant discovery and interpretation

    Get PDF
    Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.</jats:p

    Surface Modification of Polycrystalline Diamond Compacts by Carbon Ion Irradiation

    Get PDF
    Selective modification (e.g. defect creation and amorphization) of diamond surfaces is of interests for functional diamond-based semiconductors and devices. Bombarding the diamond surface with high energy radiation sources such as electron, proton, and neutrons, however, often result in detrimental defects in deep bulk regions under the diamond surface. In this study, we utilized high energy carbon ions of 3 MeV to bombard the polycrystalline diamond compact (PDC) specimen. The resultant microstructure of PDCs was investigated using micro Raman spectroscopy. The results show that the carbon bombardment successfully created point defects and amorphization in a shallow region of ∼500 nm deep on the diamond surface. The new method has great potential to allow diamond-based semiconductor devices to be used in numerous applications

    The role of tropical-extratropical interaction and synoptic variability in maintaining the South Pacific Convergence Zone in CMIP5 models

    Get PDF
    The South Pacific Convergence Zone (SPCZ) is simulated as too zonal a feature in current generation climate models, including those in Phase 5 of the Coupled Model Intercomparison Project (CMIP5). This zonal bias induces errors in tropical convective heating, with subsequent effects on global circulation. The SPCZ structure, particularly in the subtropics, is governed by the tropical-extratropical interaction between transient synoptic systems and the mean background state. However, the fidelity of synoptic-scale interactions as simulated by CMIP5 models has not yet been evaluated. In this study, analysis of synoptic variability in the simulated subtropical SPCZ reveals that the basic mechanism of tropical-extratropical interaction is generally well simulated, with storms approaching the SPCZ along comparable trajectories to observations. However, there is a broad spread in mean precipitation and its variability across the CMIP5 ensemble. Inter-model spread appears to relate to a biased background state in which the synoptic waves propagate. In particular, the region of mean negative zonal stretching deformation or "storm graveyard" in the upper troposphere?a feature previously determined to play a key role in SPCZ-storm interactions?is typically displaced in CMIP5 models to the northeast of its position in reanalysis data, albeit with individual model graveyards displaying a pronounced (25 degree) longitudinal spread. From these findings, we suggest that SPCZs simulated by CMIP5 models are not simply too zonal; rather, in models the subtropical SPCZ manifests a diagonal tilt similar to observations while SST biases force an overly zonal tropical SPCZ, resulting in a more disjointed SPCZ than observed
    corecore