8,617 research outputs found

    A clone-free, single molecule map of the domestic cow (Bos taurus) genome.

    Get PDF
    BackgroundThe cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation.ResultsThe optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts).ConclusionAlignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds

    Image-Processing Techniques for the Creation of Presentation-Quality Astronomical Images

    Full text link
    The quality of modern astronomical data, the power of modern computers and the agility of current image-processing software enable the creation of high-quality images in a purely digital form. The combination of these technological advancements has created a new ability to make color astronomical images. And in many ways it has led to a new philosophy towards how to create them. A practical guide is presented on how to generate astronomical images from research data with powerful image-processing programs. These programs use a layering metaphor that allows for an unlimited number of astronomical datasets to be combined in any desired color scheme, creating an immense parameter space to be explored using an iterative approach. Several examples of image creation are presented. A philosophy is also presented on how to use color and composition to create images that simultaneously highlight scientific detail and are aesthetically appealing. This philosophy is necessary because most datasets do not correspond to the wavelength range of sensitivity of the human eye. The use of visual grammar, defined as the elements which affect the interpretation of an image, can maximize the richness and detail in an image while maintaining scientific accuracy. By properly using visual grammar, one can imply qualities that a two-dimensional image intrinsically cannot show, such as depth, motion and energy. In addition, composition can be used to engage viewers and keep them interested for a longer period of time. The use of these techniques can result in a striking image that will effectively convey the science within the image, to scientists and to the public.Comment: 104 pages, 38 figures, submitted to A

    Discovery of large genomic inversions using long range information.

    Get PDF
    BackgroundAlthough many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies.ResultsHere we propose a novel algorithm, VALOR, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of VALOR using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of VALOR against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data.ConclusionsIn this paper, we show that VALOR is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using VALOR, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. VALOR is available at https://github.com/BilkentCompGen/VALOR

    SourcererCC: Scaling Code Clone Detection to Big Code

    Full text link
    Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

    A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem

    Get PDF
    <div><p>A nearly complete genome sequence of <em>Candidatus</em> ‘Acetothermum autotrophicum’, a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that <em>Ca.</em> ‘A. autotrophicum’ is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO<sub>2</sub> fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of <em>Ca.</em> ‘A. autotrophicum’ is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of <em>Ca.</em> ‘A. autotrophicum’ support the view that the first bacterial and archaeal lineages were H<sub>2</sub>-dependent acetogens and methanogenes living in hydrothermal environments.</p> </div

    Substrate-specific clades of active marine methylotrophs associated with a phytoplankton bloom in a temperate coastal environment

    Get PDF
    Marine microorganisms that consume one-carbon (C1) compounds are poorly described, despite their impact on global climate via an influence on aquatic and atmospheric chemistry. This study investigated marine bacterial communities involved in the metabolism of C1 compounds. These communities were of relevance to surface seawater and atmospheric chemistry in the context of a bloom that was dominated by phytoplankton known to produce dimethylsulfoniopropionate. In addition to using 16S rRNA gene fingerprinting and clone libraries to characterize samples taken from a bloom transect in July 2006, seawater samples from the phytoplankton bloom were incubated with 13C-labeled methanol, monomethylamine, dimethylamine, methyl bromide, and dimethyl sulfide to identify microbial populations involved in the turnover of C1 compounds, using DNA stable isotope probing. The [13C]DNA samples from a single time point were characterized and compared using denaturing gradient gel electrophoresis (DGGE), fingerprint cluster analysis, and 16S rRNA gene clone library analysis. Bacterial community DGGE fingerprints from 13C-labeled DNA were distinct from those obtained with the DNA of the nonlabeled community DNA and suggested some overlap in substrate utilization between active methylotroph populations growing on different C1 substrates. Active methylotrophs were affiliated with Methylophaga spp. and several clades of undescribed Gammaproteobacteria that utilized methanol, methylamines (both monomethylamine and dimethylamine), and dimethyl sulfide. rRNA gene sequences corresponding to populations assimilating 13C-labeled methyl bromide and other substrates were associated with members of the Alphaproteobacteria (e.g., the family Rhodobacteraceae), the Cytophaga-Flexibacter-Bacteroides group, and unknown taxa. This study expands the known diversity of marine methylotrophs in surface seawater and provides a comprehensive data set for focused cultivation and metagenomic analyses in the future
    • …
    corecore