126 research outputs found
Shotgun haplotyping: a novel method for surveying allelic sequence variation
Haplotypic sequences contain significantly more information than genotypes of genetic markers and are critical for studying disease association and genome evolution. Current methods for obtaining haplotypic sequences require the physical separation of alleles before sequencing, are time consuming and are not scaleable for large surveys of genetic variation. We have developed a novel method for acquiring haplotypic sequences from long PCR products using simple, high-throughput techniques. This method applies modified shotgun sequencing protocols to sequence both alleles concurrently, with read-pair information allowing the two alleles to be separated during sequence assembly. Although the haplotypic sequences can be assembled manually from the resultant data using pre-existing sequence assembly software, we have devised a novel heuristic algorithm to automate assembly and remove human error. We validated the approach on two long PCR products amplified from the human genome and confirmed the accuracy of our sequences against full-length clones of the same alleles. This method presents a simple high-throughput means to obtain full haplotypic sequences potentially up to 20 kb in length and is suitable for surveying genetic variation even in poorly-characterized genomes as it requires no prior information on sequence variation
Recommended from our members
Crumble: reference free lossy compression of sequence quality values
Motivation: The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving. Results: On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details). Availability and implementation: Crumble is OpenSource and can be obtained from https://github.com/jkbonfield/crumble. Supplementary information: Supplementary data are available at Bioinformatics online.This work was funded by the Wellcome Trust [WT098051]
Twelve years of SAMtools and BCFtools.
BACKGROUND: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. FINDINGS: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. CONCLUSION: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org
Evolution of star formation in the UKIDSS Ultra Deep Survey Field - I. Luminosity functions and cosmic star formation rate out to z = 1.6
We present new results on the cosmic star formation history in the Subaru/XMM–Newton Deep Survey (SXDS)–Ultra Deep Survey (UDS) field out to z = 1.6. We compile narrowband
data from the Subaru Telescope and the Visible and Infrared Survey Telescope for Astronomy (VISTA) in conjunction with broad-band data from the SXDS and UDS, to make a selection of 5725 emission-line galaxies in 12 redshift slices, spanning 10 Gyr of cosmic time. We determine photometric redshifts for the sample using 11-band photometry, and use
a spectroscopically confirmed subset to fine tune the resultant redshift distribution. We use the maximum-likelihood technique to determine luminosity functions in each redshift slice and model the selection effects inherent in any narrow-band selection statistically, to obviate the retrospective corrections ordinarily required. The deep narrow-band data are sensitive to very low star formation rates (SFRs), and allow an accurate evaluation of the faint end slope of the Schechter function, α. We find that α is particularly sensitive to the assumed faintest
broad-band magnitude of a galaxy capable of hosting an emission line, and propose that this limit should be empirically motivated. For this analysis, we base our threshold on the limiting observed equivalent widths of emission lines in the local Universe. We compute the
characteristic SFR of galaxies in each redshift slice, and the integrated SFR density, ρSFR. We find our results to be in good agreement with the literature and parametrize the evolution of the SFR density as ρSFR ∝ (1 + z)4.58 confirming a steep decline in star formation activity since z ∼ 1.6.
Keywords: surveys – galaxies: evolution – galaxies: formation – galaxies: high-redshift –
galaxies: star formation – cosmology: observations
Evolution of star formation in the UKIDSS Ultra Deep Survey Field - I. Luminosity functions and cosmic star formation rate out to z = 1.6 (vol 433, pg 796, 2013)
We present new results on the cosmic star formation history in the Subaru/XMM–Newton Deep Survey (SXDS)–Ultra Deep Survey (UDS) field out to z = 1.6. We compile narrowband
data from the Subaru Telescope and the Visible and Infrared Survey Telescope for Astronomy (VISTA) in conjunction with broad-band data from the SXDS and UDS, to make a selection of 5725 emission-line galaxies in 12 redshift slices, spanning 10 Gyr of cosmic time. We determine photometric redshifts for the sample using 11-band photometry, and use
a spectroscopically confirmed subset to fine tune the resultant redshift distribution. We use the maximum-likelihood technique to determine luminosity functions in each redshift slice and model the selection effects inherent in any narrow-band selection statistically, to obviate the retrospective corrections ordinarily required. The deep narrow-band data are sensitive to very low star formation rates (SFRs), and allow an accurate evaluation of the faint end slope of the Schechter function, α. We find that α is particularly sensitive to the assumed faintest
broad-band magnitude of a galaxy capable of hosting an emission line, and propose that this limit should be empirically motivated. For this analysis, we base our threshold on the limiting observed equivalent widths of emission lines in the local Universe. We compute the
characteristic SFR of galaxies in each redshift slice, and the integrated SFR density, ρSFR. We find our results to be in good agreement with the literature and parametrize the evolution of the SFR density as ρSFR ∝ (1 + z)4.58 confirming a steep decline in star formation activity
since z ∼ 1.6.
Key words: surveys – galaxies: evolution – galaxies: formation – galaxies: high-redshift –
galaxies: star formation – cosmology: observations
HTSlib: C library for reading/writing high-throughput sequencing data
Background:
Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.
Findings:
We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.
Conclusion:
Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded >1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license
Evolution of star formation in the UKIDSS ultra deep survey field-I. Luminosity functions and cosmic star formation rate out to z = 1.6
We present new results on the cosmic star formation history in the Subaru/XMM-Newton Deep Survey (SXDS)-Ultra Deep Survey (UDS) field out to z = 1.6. We compile narrowband data from the Subaru Telescope and the Visible and Infrared Survey Telescope forAstronomy (VISTA) in conjunction with broad-band data from the SXDS and UDS, to makea selection of 5725 emission-line galaxies in 12 redshift slices, spanning 10 Gyr of cosmictime. We determine photometric redshifts for the sample using 11-band photometry, and usea spectroscopically confirmed subset to fine tune the resultant redshift distribution. We usethe maximum-likelihood technique to determine luminosity functions in each redshift slice and model the selection effects inherent in any narrow-band selection statistically, to obviatethe retrospective corrections ordinarily required. The deep narrow-band data are sensitive tovery low star formation rates (SFRs), and allow an accurate evaluation of the faint end slopeof the Schechter function, α We find that a is particularly sensitive to the assumed faintest broad-band magnitude of a galaxy capable of hosting an emission line, and propose thatthis limit should be empirically motivated. For this analysis, we base our threshold on thelimiting observed equivalent widths of emission lines in the local Universe. We compute thecharacteristic SFR of galaxies in each redshift slice, and the integrated SFR density,ρ SFR. Wefind our results to be in good agreement with the literature and parametrize the evolution of the SFR density as ρ SFR α(1 + z)4.58 confirming a steep decline in star formation activity since z ~ 1.6.Peer reviewe
The Right Light – De Novo Design of a Robust Modular Photochemical Reactor for Optimum Batch and Flow Chemistry
Having identified inconsistencies when repeating literature examples of photochemical transformations and difficulties recreating experimental setups, we devised several criteria that an ideal labscale reactor should achieve. Herein, we introduce a versatile photoreactor for high throughput screening, preparative scale batch reactions and continuous processing, all with a single light source. The reactor utilizes interchangeable arrays of pseudo-monochromatic high-power LEDs in a range of synthetically useful wavelengths, combined with excellent temperature control. Moreover, light intensity can be modulated in an accurate and straightforward manner. This system has subsequently been tested on a range of literature methodologies
Petabyte-scale innovations at the European Nucleotide Archive
Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches
- …