8,617 research outputs found
A clone-free, single molecule map of the domestic cow (Bos taurus) genome.
BackgroundThe cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation.ResultsThe optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmapâ=â1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts).ConclusionAlignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds
Image-Processing Techniques for the Creation of Presentation-Quality Astronomical Images
The quality of modern astronomical data, the power of modern computers and
the agility of current image-processing software enable the creation of
high-quality images in a purely digital form. The combination of these
technological advancements has created a new ability to make color astronomical
images. And in many ways it has led to a new philosophy towards how to create
them. A practical guide is presented on how to generate astronomical images
from research data with powerful image-processing programs. These programs use
a layering metaphor that allows for an unlimited number of astronomical
datasets to be combined in any desired color scheme, creating an immense
parameter space to be explored using an iterative approach. Several examples of
image creation are presented.
A philosophy is also presented on how to use color and composition to create
images that simultaneously highlight scientific detail and are aesthetically
appealing. This philosophy is necessary because most datasets do not correspond
to the wavelength range of sensitivity of the human eye. The use of visual
grammar, defined as the elements which affect the interpretation of an image,
can maximize the richness and detail in an image while maintaining scientific
accuracy. By properly using visual grammar, one can imply qualities that a
two-dimensional image intrinsically cannot show, such as depth, motion and
energy. In addition, composition can be used to engage viewers and keep them
interested for a longer period of time. The use of these techniques can result
in a striking image that will effectively convey the science within the image,
to scientists and to the public.Comment: 104 pages, 38 figures, submitted to A
Discovery of large genomic inversions using long range information.
BackgroundAlthough many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies.ResultsHere we propose a novel algorithm, VALOR, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of VALOR using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of VALOR against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data.ConclusionsIn this paper, we show that VALOR is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using VALOR, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. VALOR is available at https://github.com/BilkentCompGen/VALOR
SourcererCC: Scaling Code Clone Detection to Big Code
Despite a decade of active research, there is a marked lack in clone
detectors that scale to very large repositories of source code, in particular
for detecting near-miss clones where significant editing activities may take
place in the cloned code. We present SourcererCC, a token-based clone detector
that targets three clone types, and exploits an index to achieve scalability to
large inter-project repositories using a standard workstation. SourcererCC uses
an optimized inverted-index to quickly query the potential clones of a given
code block. Filtering heuristics based on token ordering are used to
significantly reduce the size of the index, the number of code-block
comparisons needed to detect the clones, as well as the number of required
token-comparisons needed to judge a potential clone.
We evaluate the scalability, execution time, recall and precision of
SourcererCC, and compare it to four publicly available and state-of-the-art
tools. To measure recall, we use two recent benchmarks, (1) a large benchmark
of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of
thousands of fine-grained artificial clones. We find SourcererCC has both high
recall and precision, and is able to scale to a large inter-project repository
(250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised
A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem
<div><p>A nearly complete genome sequence of <em>Candidatus</em> âAcetothermum autotrophicumâ, a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that <em>Ca.</em> âA. autotrophicumâ is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO<sub>2</sub> fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of <em>Ca.</em> âA. autotrophicumâ is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of <em>Ca.</em> âA. autotrophicumâ support the view that the first bacterial and archaeal lineages were H<sub>2</sub>-dependent acetogens and methanogenes living in hydrothermal environments.</p> </div
Substrate-specific clades of active marine methylotrophs associated with a phytoplankton bloom in a temperate coastal environment
Marine microorganisms that consume one-carbon (C1) compounds are poorly described, despite their impact on global climate via an influence on aquatic and atmospheric chemistry. This study investigated marine bacterial communities involved in the metabolism of C1 compounds. These communities were of relevance to surface seawater and atmospheric chemistry in the context of a bloom that was dominated by phytoplankton known to produce dimethylsulfoniopropionate. In addition to using 16S rRNA gene fingerprinting and clone libraries to characterize samples taken from a bloom transect in July 2006, seawater samples from the phytoplankton bloom were incubated with 13C-labeled methanol, monomethylamine, dimethylamine, methyl bromide, and dimethyl sulfide to identify microbial populations involved in the turnover of C1 compounds, using DNA stable isotope probing. The [13C]DNA samples from a single time point were characterized and compared using denaturing gradient gel electrophoresis (DGGE), fingerprint cluster analysis, and 16S rRNA gene clone library analysis. Bacterial community DGGE fingerprints from 13C-labeled DNA were distinct from those obtained with the DNA of the nonlabeled community DNA and suggested some overlap in substrate utilization between active methylotroph populations growing on different C1 substrates. Active methylotrophs were affiliated with Methylophaga spp. and several clades of undescribed Gammaproteobacteria that utilized methanol, methylamines (both monomethylamine and dimethylamine), and dimethyl sulfide. rRNA gene sequences corresponding to populations assimilating 13C-labeled methyl bromide and other substrates were associated with members of the Alphaproteobacteria (e.g., the family Rhodobacteraceae), the Cytophaga-Flexibacter-Bacteroides group, and unknown taxa. This study expands the known diversity of marine methylotrophs in surface seawater and provides a comprehensive data set for focused cultivation and metagenomic analyses in the future
Recommended from our members
Relationship between latent and rebound viruses in a clinical trial of anti-HIV-1 antibody 3BNC117.
A clinical trial was performed to evaluate 3BNC117, a potent anti-HIV-1 antibody, in infected individuals during suppressive antiretroviral therapy and subsequent analytical treatment interruption (ATI). The circulating reservoir was evaluated by quantitative and qualitative viral outgrowth assay (Q2VOA) at entry and after 6 mo. There were no significant quantitative changes in the size of the reservoir before ATI, and the composition of circulating reservoir clones varied in a manner that did not correlate with 3BNC117 sensitivity. 3BNC117 binding site amino acid variants found in rebound viruses preexisted in the latent reservoir. However, only 3 of 217 rebound viruses were identical to 868 latent viruses isolated by Q2VOA and near full-length sequencing. Instead, 63% of the rebound viruses appeared to be recombinants, even in individuals with 3BNC117-resistant reservoir viruses. In conclusion, viruses emerging during ATI in individuals treated with 3BNC117 are not the dominant species found in the circulating latent reservoir, but frequently appear to represent recombinants of latent viruses
- âŚ