12,012 research outputs found
DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing
We consider the correction of errors from nucleotide sequences produced by
next-generation targeted amplicon sequencing. The next-generation sequencing
(NGS) platforms can provide a great deal of sequencing data thanks to their
high throughput, but the associated error rates often tend to be high.
Denoising in high-throughput sequencing has thus become a crucial process for
boosting the reliability of downstream analyses. Our methodology, named
DUDE-Seq, is derived from a general setting of reconstructing finite-valued
source data corrupted by a discrete memoryless channel and effectively corrects
substitution and homopolymer indel errors, the two major types of sequencing
errors in most high-throughput targeted amplicon sequencing platforms. Our
experimental studies with real and simulated datasets suggest that the proposed
DUDE-Seq not only outperforms existing alternatives in terms of
error-correction capability and time efficiency, but also boosts the
reliability of downstream analyses. Further, the flexibility of DUDE-Seq
enables its robust application to different sequencing platforms and analysis
pipelines by simple updates of the noise model. DUDE-Seq is available at
http://data.snu.ac.kr/pub/dude-seq
Evidence of global-scale aeolian dispersal and endemism in isolated geothermal microbial communities of Antarctica
New evidence in aerobiology challenges the assumption that geographical isolation is an effective barrier to microbial transport. However, given the uncertainty with which aerobiological organisms are recruited into existing communities, the ultimate impact of microbial dispersal is difficult to assess. To evaluate the ecological significance of global-scale microbial dispersal, molecular genetic approaches were used to examine microbial communities inhabiting fumarolic soils on Mt. Erebus, the southernmost geothermal site on Earth. There, hot, fumarolic soils provide an effective environmental filter to test the viability of organisms that have been distributed via aeolian transport over geological time. We find that cosmopolitan thermophiles dominate the surface, whereas endemic Archaea and members of poorly understood Bacterial candidate divisions dominate the immediate subsurface. These results imply that aeolian processes readily disperse viable organisms globally, where they are incorporated into pre-existing complex communities of endemic and cosmopolitan taxa
A two-band approach to n phase error corrections with LBTI's PHASECam
PHASECam is the Large Binocular Telescope Interferometer's (LBTI) phase
sensor, a near-infrared camera which is used to measure tip/tilt and phase
variations between the two AO-corrected apertures of the Large Binocular
Telescope (LBT). Tip/tilt and phase sensing are currently performed in the H
(1.65 m) and K (2.2 m) bands at 1 kHz, and the K band phase telemetry
is used to send tip/tilt and Optical Path Difference (OPD) corrections to the
system. However, phase variations outside the range [-, ] are not
sensed, and thus are not fully corrected during closed-loop operation.
PHASECam's phase unwrapping algorithm, which attempts to mitigate this issue,
still occasionally fails in the case of fast, large phase variations. This can
cause a fringe jump, in which case the unwrapped phase will be incorrect by a
wavelength or more. This can currently be manually corrected by the observer,
but this is inefficient. A more reliable and automated solution is desired,
especially as the LBTI begins to commission further modes which require robust,
active phase control, including controlled multi-axial (Fizeau) interferometry
and dual-aperture non-redundant aperture masking interferometry. We present a
multi-wavelength method of fringe jump capture and correction which involves
direct comparison between the K band and currently unused H band phase
telemetry.Comment: 17 pages, 10 figure
Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples
Motivation: Whole-genome high-coverage sequencing has been widely used for
personal and cancer genomics as well as in various research areas. However, in
the lack of an unbiased whole-genome truth set, the global error rate of
variant calls and the leading causal artifacts still remain unclear even given
the great efforts in the evaluation of variant calling methods.
Results: We made ten SNP and INDEL call sets with two read mappers and five
variant callers, both on a haploid human genome and a diploid genome at a
similar coverage. By investigating false heterozygous calls in the haploid
genome, we identified the erroneous realignment in low-complexity regions and
the incomplete reference genome with respect to the sample as the two major
sources of errors, which press for continued improvements in these two areas.
We estimated that the error rate of raw genotype calls is as high as 1 in
10-15kb, but the error rate of post-filtered calls is reduced to 1 in 100-200kb
without significant compromise on the sensitivity.
Availability: BWA-MEM alignment: http://bit.ly/1g8XqRt; Scripts:
https://github.com/lh3/varcmp; Additional data:
https://figshare.com/articles/Towards_better_understanding_of_artifacts_in_variating_calling_from_high_coverage_samples/981073Comment: Published versio
Genome-wide signatures of complex introgression and adaptive evolution in the big cats.
The great cats of the genus Panthera comprise a recent radiation whose evolutionary history is poorly understood. Their rapid diversification poses challenges to resolving their phylogeny while offering opportunities to investigate the historical dynamics of adaptive divergence. We report the sequence, de novo assembly, and annotation of the jaguar (Panthera onca) genome, a novel genome sequence for the leopard (Panthera pardus), and comparative analyses encompassing all living Panthera species. Demographic reconstructions indicated that all of these species have experienced variable episodes of population decline during the Pleistocene, ultimately leading to small effective sizes in present-day genomes. We observed pervasive genealogical discordance across Panthera genomes, caused by both incomplete lineage sorting and complex patterns of historical interspecific hybridization. We identified multiple signatures of species-specific positive selection, affecting genes involved in craniofacial and limb development, protein metabolism, hypoxia, reproduction, pigmentation, and sensory perception. There was remarkable concordance in pathways enriched in genomic segments implicated in interspecies introgression and in positive selection, suggesting that these processes were connected. We tested this hypothesis by developing exome capture probes targeting ~19,000 Panthera genes and applying them to 30 wild-caught jaguars. We found at least two genes (DOCK3 and COL4A5, both related to optic nerve development) bearing significant signatures of interspecies introgression and within-species positive selection. These findings indicate that post-speciation admixture has contributed genetic material that facilitated the adaptive evolution of big cat lineages
The Panchromatic Hubble Andromeda Treasury
The Panchromatic Hubble Andromeda Treasury (PHAT) is an on-going HST
Multicycle Treasury program to image ~1/3 of M31's star forming disk in 6
filters, from the UV to the NIR. The full survey will resolve the galaxy into
more than 100 million stars with projected radii from 0-20 kpc over a
contiguous 0.5 square degree area in 828 orbits, producing imaging in the F275W
and F336W filters with WFC3/UVIS, F475W and F814W with ACS/WFC, and F110W and
F160W with WFC3/IR. The resulting wavelength coverage gives excellent
constraints on stellar temperature, bolometric luminosity, and extinction for
most spectral types. The photometry reaches SNR=4 at F275W=25.1, F336W=24.9,
F475W=27.9, F814W=27.1, F110W=25.5, and F160W=24.6 for single pointings in the
uncrowded outer disk; however, the optical and NIR data are crowding limited,
and the deepest reliable magnitudes are up to 5 magnitudes brighter in the
inner bulge. All pointings are dithered and produce Nyquist-sampled images in
F475W, F814W, and F160W. We describe the observing strategy, photometry,
astrometry, and data products, along with extensive tests of photometric
stability, crowding errors, spatially-dependent photometric biases, and
telescope pointing control. We report on initial fits to the structure of M31's
disk, derived from the density of RGB stars, in a way that is independent of
the assumed M/L and is robust to variations in dust extinction. These fits also
show that the 10 kpc ring is not just a region of enhanced recent star
formation, but is instead a dynamical structure containing a significant
overdensity of stars with ages >1 Gyr. (Abridged)Comment: 48 pages including 22 pages of figures. Accepted to the Astrophysical
Journal Supplements. Some figures slightly degraded to reduce submission siz
The fate of Arabidopsis thaliana homeologous CNSs and their motifs in the Paleohexaploid Brassica rapa.
Following polyploidy, duplicate genes are often deleted, and if they are not, then duplicate regulatory regions are sometimes lost. By what mechanism is this loss and what is the chance that such a loss removes function? To explore these questions, we followed individual Arabidopsis thaliana-A. thaliana conserved noncoding sequences (CNSs) into the Brassica ancestor, through a paleohexaploidy and into Brassica rapa. Thus, a single Brassicaceae CNS has six potential orthologous positions in B. rapa; a single Arabidopsis CNS has three potential homeologous positions. We reasoned that a CNS, if present on a singlet Brassica gene, would be unlikely to lose function compared with a more redundant CNS, and this is the case. Redundant CNSs go nondetectable often. Using this logic, each mechanism of CNS loss was assigned a metric of functionality. By definition, proved deletions do not function as sequence. Our results indicated that CNSs that go nondetectable by base substitution or large insertion are almost certainly still functional (redundancy does not matter much to their detectability frequency), whereas those lost by inferred deletion or indels are approximately 75% likely to be nonfunctional. Overall, an average nondetectable, once-redundant CNS more than 30 bp in length has a 72% chance of being nonfunctional, and that makes sense because 97% of them sort to a molecular mechanism with deletion in its description, but base substitutions do cause loss. Similarly, proved-functional G-boxes go undetectable by deletion 82% of the time. Fractionation mutagenesis is a procedure that uses polyploidy as a mutagenic agent to genetically alter RNA expression profiles, and then to construct testable hypotheses as to the function of the lost regulatory site. We show fractionation mutagenesis to be a deletion machine in the Brassica lineage
Optimization of DNA extraction from human urinary samples for mycobiome community profiling.
IntroductionRecent data suggest the urinary tract hosts a microbial community of varying composition, even in the absence of infection. Culture-independent methodologies, such as next-generation sequencing of conserved ribosomal DNA sequences, provide an expansive look at these communities, identifying both common commensals and fastidious organisms. A fundamental challenge has been the isolation of DNA representative of the entire resident microbial community, including fungi.Materials and methodsWe evaluated multiple modifications of commonly-used DNA extraction procedures using standardized male and female urine samples, comparing resulting overall, fungal and bacterial DNA yields by quantitative PCR. After identifying protocol modifications that increased DNA yields (lyticase/lysozyme digestion, bead beating, boil/freeze cycles, proteinase K treatment, and carrier DNA use), all modifications were combined for systematic confirmation of optimal protocol conditions. This optimized protocol was tested against commercially available methodologies to compare overall and microbial DNA yields, community representation and diversity by next-generation sequencing (NGS).ResultsOverall and fungal-specific DNA yields from standardized urine samples demonstrated that microbial abundances differed significantly among the eight methods used. Methodologies that included multiple disruption steps, including enzymatic, mechanical, and thermal disruption and proteinase digestion, particularly in combination with small volume processing and pooling steps, provided more comprehensive representation of the range of bacterial and fungal species. Concentration of larger volume urine specimens at low speed centrifugation proved highly effective, increasing resulting DNA levels and providing greater microbial representation and diversity.ConclusionsAlterations in the methodology of urine storage, preparation, and DNA processing improve microbial community profiling using culture-independent sequencing methods. Our optimized protocol for DNA extraction from urine samples provided improved fungal community representation. Use of this technique resulted in equivalent representation of the bacterial populations as well, making this a useful technique for the concurrent evaluation of bacterial and fungal populations by NGS
The ACS Survey of Galactic Globular Clusters XI: The Three-Dimensional Orientation of the Sagittarius Dwarf Spheroidal Galaxy and its Globular Clusters
We use observations from the ACS study of Galactic globular clusters to
investigate the spatial distribution of the inner regions of the disrupting
Sagittarius dwarf spheroidal galaxy (Sgr). We combine previously published
analyses of four Sgr member clusters located near or in the Sgr core (M54, Arp
2, Terzan 7 and Terzan 8) with a new analysis of diffuse Sgr material
identified in the background of five low-latitude Galactic bulge clusters (NGC
6624, 6637, 6652, 6681 and 6809) observed as part of the ACS survey. By
comparing the bulge cluster CMDs to our previous analysis of the M54/Sgr core,
we estimate distances to these background features. The combined data from four
Sgr member clusters and five Sgr background features provides nine independent
measures of the Sgr distance and, as a group, provide uniformly measured and
calibrated probes of different parts of the inner regions of Sgr spanning
twenty degrees over the face of the disrupting dwarf. This allows us, for the
first time, to constrain the three dimensional orientation of Sgr's disrupting
core and globular cluster system and compare that orientation to the
predictions of an N-body model of tidal disruption. The density and distance of
Sgr debris is consistent with models that favor a relatively high Sgr core mass
and a slightly greater distance (28-30 kpc, with a mean of 29.4 kpc). Our
analysis also suggests that M54 is in the foreground of Sgr by ~2 kpc,
projected on the center of the Sgr dSph. While this would imply a remarkable
alignment of the cluster and the Sgr nucleus along the line of sight, we can
not identify any systematic effect in our analysis that would falsely create
the measured 2 kpc separation. Finally, we find that the cluster Terzan 7 has
the most discrepant distance (25 kpc) among the four Sgr core clusters, which
may suggest a different dynamical history than the other Sgr core clusters.Comment: 41 pages, 16 figures, accepted to Ap
- …