12,012 research outputs found

    DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

    Full text link
    We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

    Evidence of global-scale aeolian dispersal and endemism in isolated geothermal microbial communities of Antarctica

    Get PDF
    New evidence in aerobiology challenges the assumption that geographical isolation is an effective barrier to microbial transport. However, given the uncertainty with which aerobiological organisms are recruited into existing communities, the ultimate impact of microbial dispersal is difficult to assess. To evaluate the ecological significance of global-scale microbial dispersal, molecular genetic approaches were used to examine microbial communities inhabiting fumarolic soils on Mt. Erebus, the southernmost geothermal site on Earth. There, hot, fumarolic soils provide an effective environmental filter to test the viability of organisms that have been distributed via aeolian transport over geological time. We find that cosmopolitan thermophiles dominate the surface, whereas endemic Archaea and members of poorly understood Bacterial candidate divisions dominate the immediate subsurface. These results imply that aeolian processes readily disperse viable organisms globally, where they are incorporated into pre-existing complex communities of endemic and cosmopolitan taxa

    A two-band approach to nλ\lambda phase error corrections with LBTI's PHASECam

    Full text link
    PHASECam is the Large Binocular Telescope Interferometer's (LBTI) phase sensor, a near-infrared camera which is used to measure tip/tilt and phase variations between the two AO-corrected apertures of the Large Binocular Telescope (LBT). Tip/tilt and phase sensing are currently performed in the H (1.65 μ\mum) and K (2.2 μ\mum) bands at 1 kHz, and the K band phase telemetry is used to send tip/tilt and Optical Path Difference (OPD) corrections to the system. However, phase variations outside the range [-π\pi, π\pi] are not sensed, and thus are not fully corrected during closed-loop operation. PHASECam's phase unwrapping algorithm, which attempts to mitigate this issue, still occasionally fails in the case of fast, large phase variations. This can cause a fringe jump, in which case the unwrapped phase will be incorrect by a wavelength or more. This can currently be manually corrected by the observer, but this is inefficient. A more reliable and automated solution is desired, especially as the LBTI begins to commission further modes which require robust, active phase control, including controlled multi-axial (Fizeau) interferometry and dual-aperture non-redundant aperture masking interferometry. We present a multi-wavelength method of fringe jump capture and correction which involves direct comparison between the K band and currently unused H band phase telemetry.Comment: 17 pages, 10 figure

    Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples

    Full text link
    Motivation: Whole-genome high-coverage sequencing has been widely used for personal and cancer genomics as well as in various research areas. However, in the lack of an unbiased whole-genome truth set, the global error rate of variant calls and the leading causal artifacts still remain unclear even given the great efforts in the evaluation of variant calling methods. Results: We made ten SNP and INDEL call sets with two read mappers and five variant callers, both on a haploid human genome and a diploid genome at a similar coverage. By investigating false heterozygous calls in the haploid genome, we identified the erroneous realignment in low-complexity regions and the incomplete reference genome with respect to the sample as the two major sources of errors, which press for continued improvements in these two areas. We estimated that the error rate of raw genotype calls is as high as 1 in 10-15kb, but the error rate of post-filtered calls is reduced to 1 in 100-200kb without significant compromise on the sensitivity. Availability: BWA-MEM alignment: http://bit.ly/1g8XqRt; Scripts: https://github.com/lh3/varcmp; Additional data: https://figshare.com/articles/Towards_better_understanding_of_artifacts_in_variating_calling_from_high_coverage_samples/981073Comment: Published versio

    Genome-wide signatures of complex introgression and adaptive evolution in the big cats.

    Get PDF
    The great cats of the genus Panthera comprise a recent radiation whose evolutionary history is poorly understood. Their rapid diversification poses challenges to resolving their phylogeny while offering opportunities to investigate the historical dynamics of adaptive divergence. We report the sequence, de novo assembly, and annotation of the jaguar (Panthera onca) genome, a novel genome sequence for the leopard (Panthera pardus), and comparative analyses encompassing all living Panthera species. Demographic reconstructions indicated that all of these species have experienced variable episodes of population decline during the Pleistocene, ultimately leading to small effective sizes in present-day genomes. We observed pervasive genealogical discordance across Panthera genomes, caused by both incomplete lineage sorting and complex patterns of historical interspecific hybridization. We identified multiple signatures of species-specific positive selection, affecting genes involved in craniofacial and limb development, protein metabolism, hypoxia, reproduction, pigmentation, and sensory perception. There was remarkable concordance in pathways enriched in genomic segments implicated in interspecies introgression and in positive selection, suggesting that these processes were connected. We tested this hypothesis by developing exome capture probes targeting ~19,000 Panthera genes and applying them to 30 wild-caught jaguars. We found at least two genes (DOCK3 and COL4A5, both related to optic nerve development) bearing significant signatures of interspecies introgression and within-species positive selection. These findings indicate that post-speciation admixture has contributed genetic material that facilitated the adaptive evolution of big cat lineages

    The Panchromatic Hubble Andromeda Treasury

    Get PDF
    The Panchromatic Hubble Andromeda Treasury (PHAT) is an on-going HST Multicycle Treasury program to image ~1/3 of M31's star forming disk in 6 filters, from the UV to the NIR. The full survey will resolve the galaxy into more than 100 million stars with projected radii from 0-20 kpc over a contiguous 0.5 square degree area in 828 orbits, producing imaging in the F275W and F336W filters with WFC3/UVIS, F475W and F814W with ACS/WFC, and F110W and F160W with WFC3/IR. The resulting wavelength coverage gives excellent constraints on stellar temperature, bolometric luminosity, and extinction for most spectral types. The photometry reaches SNR=4 at F275W=25.1, F336W=24.9, F475W=27.9, F814W=27.1, F110W=25.5, and F160W=24.6 for single pointings in the uncrowded outer disk; however, the optical and NIR data are crowding limited, and the deepest reliable magnitudes are up to 5 magnitudes brighter in the inner bulge. All pointings are dithered and produce Nyquist-sampled images in F475W, F814W, and F160W. We describe the observing strategy, photometry, astrometry, and data products, along with extensive tests of photometric stability, crowding errors, spatially-dependent photometric biases, and telescope pointing control. We report on initial fits to the structure of M31's disk, derived from the density of RGB stars, in a way that is independent of the assumed M/L and is robust to variations in dust extinction. These fits also show that the 10 kpc ring is not just a region of enhanced recent star formation, but is instead a dynamical structure containing a significant overdensity of stars with ages >1 Gyr. (Abridged)Comment: 48 pages including 22 pages of figures. Accepted to the Astrophysical Journal Supplements. Some figures slightly degraded to reduce submission siz

    The fate of Arabidopsis thaliana homeologous CNSs and their motifs in the Paleohexaploid Brassica rapa.

    Get PDF
    Following polyploidy, duplicate genes are often deleted, and if they are not, then duplicate regulatory regions are sometimes lost. By what mechanism is this loss and what is the chance that such a loss removes function? To explore these questions, we followed individual Arabidopsis thaliana-A. thaliana conserved noncoding sequences (CNSs) into the Brassica ancestor, through a paleohexaploidy and into Brassica rapa. Thus, a single Brassicaceae CNS has six potential orthologous positions in B. rapa; a single Arabidopsis CNS has three potential homeologous positions. We reasoned that a CNS, if present on a singlet Brassica gene, would be unlikely to lose function compared with a more redundant CNS, and this is the case. Redundant CNSs go nondetectable often. Using this logic, each mechanism of CNS loss was assigned a metric of functionality. By definition, proved deletions do not function as sequence. Our results indicated that CNSs that go nondetectable by base substitution or large insertion are almost certainly still functional (redundancy does not matter much to their detectability frequency), whereas those lost by inferred deletion or indels are approximately 75% likely to be nonfunctional. Overall, an average nondetectable, once-redundant CNS more than 30 bp in length has a 72% chance of being nonfunctional, and that makes sense because 97% of them sort to a molecular mechanism with deletion in its description, but base substitutions do cause loss. Similarly, proved-functional G-boxes go undetectable by deletion 82% of the time. Fractionation mutagenesis is a procedure that uses polyploidy as a mutagenic agent to genetically alter RNA expression profiles, and then to construct testable hypotheses as to the function of the lost regulatory site. We show fractionation mutagenesis to be a deletion machine in the Brassica lineage

    Optimization of DNA extraction from human urinary samples for mycobiome community profiling.

    Get PDF
    IntroductionRecent data suggest the urinary tract hosts a microbial community of varying composition, even in the absence of infection. Culture-independent methodologies, such as next-generation sequencing of conserved ribosomal DNA sequences, provide an expansive look at these communities, identifying both common commensals and fastidious organisms. A fundamental challenge has been the isolation of DNA representative of the entire resident microbial community, including fungi.Materials and methodsWe evaluated multiple modifications of commonly-used DNA extraction procedures using standardized male and female urine samples, comparing resulting overall, fungal and bacterial DNA yields by quantitative PCR. After identifying protocol modifications that increased DNA yields (lyticase/lysozyme digestion, bead beating, boil/freeze cycles, proteinase K treatment, and carrier DNA use), all modifications were combined for systematic confirmation of optimal protocol conditions. This optimized protocol was tested against commercially available methodologies to compare overall and microbial DNA yields, community representation and diversity by next-generation sequencing (NGS).ResultsOverall and fungal-specific DNA yields from standardized urine samples demonstrated that microbial abundances differed significantly among the eight methods used. Methodologies that included multiple disruption steps, including enzymatic, mechanical, and thermal disruption and proteinase digestion, particularly in combination with small volume processing and pooling steps, provided more comprehensive representation of the range of bacterial and fungal species. Concentration of larger volume urine specimens at low speed centrifugation proved highly effective, increasing resulting DNA levels and providing greater microbial representation and diversity.ConclusionsAlterations in the methodology of urine storage, preparation, and DNA processing improve microbial community profiling using culture-independent sequencing methods. Our optimized protocol for DNA extraction from urine samples provided improved fungal community representation. Use of this technique resulted in equivalent representation of the bacterial populations as well, making this a useful technique for the concurrent evaluation of bacterial and fungal populations by NGS

    The ACS Survey of Galactic Globular Clusters XI: The Three-Dimensional Orientation of the Sagittarius Dwarf Spheroidal Galaxy and its Globular Clusters

    Full text link
    We use observations from the ACS study of Galactic globular clusters to investigate the spatial distribution of the inner regions of the disrupting Sagittarius dwarf spheroidal galaxy (Sgr). We combine previously published analyses of four Sgr member clusters located near or in the Sgr core (M54, Arp 2, Terzan 7 and Terzan 8) with a new analysis of diffuse Sgr material identified in the background of five low-latitude Galactic bulge clusters (NGC 6624, 6637, 6652, 6681 and 6809) observed as part of the ACS survey. By comparing the bulge cluster CMDs to our previous analysis of the M54/Sgr core, we estimate distances to these background features. The combined data from four Sgr member clusters and five Sgr background features provides nine independent measures of the Sgr distance and, as a group, provide uniformly measured and calibrated probes of different parts of the inner regions of Sgr spanning twenty degrees over the face of the disrupting dwarf. This allows us, for the first time, to constrain the three dimensional orientation of Sgr's disrupting core and globular cluster system and compare that orientation to the predictions of an N-body model of tidal disruption. The density and distance of Sgr debris is consistent with models that favor a relatively high Sgr core mass and a slightly greater distance (28-30 kpc, with a mean of 29.4 kpc). Our analysis also suggests that M54 is in the foreground of Sgr by ~2 kpc, projected on the center of the Sgr dSph. While this would imply a remarkable alignment of the cluster and the Sgr nucleus along the line of sight, we can not identify any systematic effect in our analysis that would falsely create the measured 2 kpc separation. Finally, we find that the cluster Terzan 7 has the most discrepant distance (25 kpc) among the four Sgr core clusters, which may suggest a different dynamical history than the other Sgr core clusters.Comment: 41 pages, 16 figures, accepted to Ap
    corecore