34 research outputs found

    Data structures and compression algorithms for high-throughput sequencing technologies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data.</p> <p>Results</p> <p>We develop data structures and compression algorithms for HTS data. A processing stage maps short sequences to a reference genome or a large table of sequences. Then the integers representing the short sequence absolute or relative addresses, their length, and the substitutions they may contain are compressed and stored using various entropy coding algorithms, including both old and new fixed codes (e.g Golomb, Elias Gamma, MOV) and variable codes (e.g. Huffman). The general methodology is illustrated and applied to several HTS data sets. Results show that the information contained in HTS files can be compressed by a factor of 10 or more, depending on the statistical properties of the data sets and various other choices and constraints. Our algorithms fair well against general purpose compression programs such as gzip, bzip2 and 7zip; timing results show that our algorithms are consistently faster than the best general purpose compression programs.</p> <p>Conclusions</p> <p>It is not likely that exactly one encoding strategy will be optimal for all types of HTS data. Different experimental conditions are going to generate various data distributions whereby one encoding strategy can be more effective than another. We have implemented some of our encoding algorithms into the software package GenCompress which is available upon request from the authors. With the advent of HTS technology and increasingly new experimental protocols for using the technology, sequence databases are expected to continue rising in size. The methodology we have proposed is general, and these advanced compression techniques should allow researchers to manage and share their HTS data in a more timely fashion.</p

    MotifMap: integrative genome-wide maps of regulatory motif sites for model species

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A central challenge of biology is to map and understand gene regulation on a genome-wide scale. For any given genome, only a small fraction of the regulatory elements embedded in the DNA sequence have been characterized, and there is great interest in developing computational methods to systematically map all these elements and understand their relationships. Such computational efforts, however, are significantly hindered by the overwhelming size of non-coding regions and the statistical variability and complex spatial organizations of regulatory elements and interactions. Genome-wide catalogs of regulatory elements for all model species simply do not yet exist.</p> <p>Results</p> <p>The MotifMap system uses databases of transcription factor binding motifs, refined genome alignments, and a comparative genomic statistical approach to provide comprehensive maps of candidate regulatory elements encoded in the genomes of model species. The system is used to derive new genome-wide maps for yeast, fly, worm, mouse, and human. The human map contains 519,108 sites for 570 matrices with a False Discovery Rate of 0.1 or less. The new maps are assessed in several ways, for instance using high-throughput experimental ChIP-seq data and AUC statistics, providing strong evidence for their accuracy and coverage. The maps can be usefully integrated with many other kinds of omic data and are available at <url>http://motifmap.igb.uci.edu/</url>.</p> <p>Conclusions</p> <p>MotifMap and its integration with other data provide a foundation for analyzing gene regulation on a genome-wide scale, and for automatically generating regulatory pathways and hypotheses. The power of this approach is demonstrated and discussed using the P53 apoptotic pathway and the Gli hedgehog pathways as examples.</p

    Increased fluxes of shelf-derived materials to the central Arctic Ocean

    Get PDF
    © The Author(s), 2018. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Science Advances 4 (2018): eaao1302, doi:10.1126/sciadv.aao1302.Rising temperatures in the Arctic Ocean region are responsible for changes such as reduced ice cover, permafrost thawing, and increased river discharge, which, together, alter nutrient and carbon cycles over the vast Arctic continental shelf. We show that the concentration of radium-228, sourced to seawater through sediment-water exchange processes, has increased substantially in surface waters of the central Arctic Ocean over the past decade. A mass balance model for 228Ra suggests that this increase is due to an intensification of shelf-derived material inputs to the central basin, a source that would also carry elevated concentrations of dissolved organic carbon and nutrients. Therefore, we suggest that significant changes in the nutrient, carbon, and trace metal balances of the Arctic Ocean are underway, with the potential to affect biological productivity and species assemblages in Arctic surface waters.This work was funded by NSF awards OCE-1458305 to M.A.C. and OCE-1458424 to W.S.M. The Mackenzie River sampling was supported by a Graduate Student Research Award from the North Pacific Research Board to L.E.K. L.E.K. also acknowledges support from a National Defense Science and Engineering Graduate Fellowship. I.G.R. acknowledges funding by the contributors to the U.S. Interagency Arctic Buoy Program, which include the U.S. Coast Guard, the Department of Energy, NASA, the U.S. Navy, the National Oceanic and Atmospheric Administration, and NSF

    Retrotransposon profiling of RNA polymerase III initiation sites

    Get PDF
    Although retroviruses are relatively promiscuous in choice of integration sites, retrotransposons can display marked integration specificity. In yeast and slime mold, some retrotransposons are associated with tRNA genes (tDNAs). In the Saccharomyces cerevisiae genome, the long terminal repeat retrotransposon Ty3 is found at RNA polymerase III (Pol III) transcription start sites of tDNAs. Ty1, 2, and 4 elements also cluster in the upstream regions of these genes. To determine the extent to which other Pol III-transcribed genes serve as genomic targets for Ty3, a set of 10,000 Ty3 genomic retrotranspositions were mapped using high-throughput DNA sequencing. Integrations occurred at all known tDNAs, two tDNA relics (iYGR033c and ZOD1), and six non-tDNA, Pol III-transcribed types of genes (RDN5, SNR6, SNR52, RPR1, RNA170, and SCR1). Previous work in vitro demonstrated that the Pol III transcription factor (TF) IIIB is important for Ty3 targeting. However, seven loci that bind the TFIIIB loader, TFIIIC, were not targeted, underscoring the unexplained absence of TFIIIB at those sites. Ty3 integrations also occurred in two open reading frames not previously associated with Pol III transcription, suggesting the existence of a small number of additional sites in the yeast genome that interact with Pol III transcription complexes

    S K S Splitting beneath Continental Rift Zones

    Get PDF
    We present measurements of S K S splitting at 28 digital seismic stations and 35 analog stations in the Baikal rift zone, Siberia, and adjacent areas, and at 17 stations in the East African Rift in Kenya and compare them with previous measurements from the Rio Grande Rift of North America. Fast directions in the inner region of the Baikal rift zone are distributed in two orthogonal directions, NE and NW, approximately parallel and perpendicular to the NE strike of the rift. In the adjacent Siberian platform and northern Mongolian fold belt, only the rift-orthogonal fast direction is observed. In southcentral Mongolia, the dominant fast direction changes to rift-parallel again, although a small number of measurements are still rift-orthogonal. For the axial zones of the East African and Rio Grande Rifts, fast directions are oriented on average NNE, that is, rotated clockwise from the N-S trending rift. All three rifts are underlain by low-velocity upper mantle as determined from teleseismic tomography. Rift-related mantle flow provides a plausible interpretation for the rift-orthogonal fast directions. The rift-parallel fast directions near the rift axes can be interpreted by oriented magmatic cracks in the mantle or small-scale mantle convection with rift-parallel flow. The agreement between stress estimates and corresponding crack orientations lends some weight to the suggestion that the rift-parallel fast directions are caused by oriented magmatic cracks

    Reply [to “Comment on “SKS Splitting beneath Continental Rifts Zones” by Gao et al.”]

    Get PDF
    Vauchez et al. [this issue] (hereinafter refered to as VBN) interpret the petrologic, tomographic, and anisotropy data from continental rifts to support a model of continental rifting [Nicolas, 1993; Nicolas et al., 1994] in which the lithosphere splits along the rift axis and asthenosphere flows in from the sides to fill the resulting gap. We suggest here that the data can also be described by a model in which the lower lithosphere is modified or eroded by active mantle upwelling over a region of significantly greater dimensions than the rift graben and that partial melt developing in the upwelling region can account for the widespread volcanism, as well as the seismic properties. Nicolas [1993] argued that rift-aligned anisotropy could be explained by rift-parallel mantle flow. We thank VBN for bringing this relevant paper to our attention. Volcanism about the East African Rift and the Rio Grande is not confined to the rifts but extends hundreds of kilometers from the rift axes (Mount Kilimanjaro, Mount Elgon, Mount Kenya in East Africa, The Jemez Lineament on the Rio Grande) in regions uplifted relative to their surroundings. The low-velocity tomographic anomalies also extend beneath the uplifted regions and are thought to be related to the uplift possibly supporting it by thermostatic buoyancy. The size of the P and S velocity contrasts and attenuation of high frequencies have led to the suggestion that large regions of the anomalous bodies have temperatures at or above the solidus [Achauer et al, 1994; Slack et al., 1994, 1996]. The wide extent of the anomalous regions is not explicable as resulting from an abyssal lithospheric dike beneath the rift intruded by asthenosphere. The extension of the East African, Baikal, and Rio Grande rift grabens has been estimated to be about 10 km [Baker et al., 1972; Baldridge et al., 1984; Morgan and Golombek, 1984; Logatchev and Florensov, 1978]. Passive influx of asthenosphere into a 10 km lithospheric dike is insufficient to explain the tomographic anomalies [Davis, 1991]. In addition, the amount of finite strain from lithospheric diking is insufficient to explain the anisotropy anomalies. Active replacement or modification of lower lithosphere either prior to, or contemporaneous with, rifting could generate tomographic anomalies of this magnitude

    Remote Sensing of Antarctic Sea Ice with Coordinated Aircraft and Satellite Data Acquisitions

    Get PDF
    Remote sensing of Antarctic sea ice is required to characterize properties of the vast sea ice cover to understand its long-term increase in contrast to the decrease of Arctic sea ice. For this objective, the OIB/TanDEM-X Coordinated Science Campaign (OTASC) was successfully conducted in 2017 to obtain contemporaneous and collocated remote sensing data from NASA's Operation IceBridge (OIB) and the German Aerospace Center (DLR) TanDEM-X Synthetic Aperture Radar (SAR) system at X-band together with Sentinel-1 and RADARSAT-2 SARs at C-band in conjunction with WorldView satellite spectral sensors, surface measurements, and field observations. The Weddell Sea and the Ross Sea were two primary regions while SAR data were also collected over six other regions in the Southern Ocean. Satellite SAR data included both polarimetric and interferometric capabilities to infer snow and sea ice information in three dimensions (3D), while OIB/P-3 aircraft data include snow radar together with altimeter data for snow and sea ice observations in 3D over the Weddell Sea. Across the Ross Sea, IcePOD and AntNZ/York-University flights were carried out together with satellite SAR data acquisitions

    Field and Satellite Observations of the Formation and Distribution of Arctic Atmospheric Bromine Above a Rejuvenated Sea Ice Cover

    Get PDF
    Recent drastic reduction of the older perennial sea ice in the Arctic Ocean has resulted in a vast expansion of younger and saltier seasonal sea ice. This increase in the salinity of the overall ice cover could impact tropospheric chemical processes. Springtime perennial ice extent in 2008 and 2009 broke the half-century record minimum in 2007 by about one million km2. In both years seasonal ice was dominant across the Beaufort Sea extending to the Amundsen Gulf, where significant field and satellite observations of sea ice, temperature, and atmospheric chemicals have been made. Measurements at the site of the Canadian Coast Guard Ship Amundsen ice breaker in the Amundsen Gulf showed events of increased bromine monoxide (BrO), coupled with decreases of ozone (O3) and gaseous elemental mercury (GEM), during cold periods in March 2008. The timing of the main event of BrO, O3, and GEM changes was found to be consistent with BrO observed by satellites over an extensive area around the site. Furthermore, satellite sensors detected a doubling of atmospheric BrO in a vortex associated with a spiral rising air pattern. In spring 2009, excessive and widespread bromine explosions occurred in the same region while the regional air temperature was low and the extent of perennial ice was significantly reduced compared to the case in 2008. Using satellite observations together with a Rising-Air-Parcel model, we discover a topographic control on BrO distribution such that the Alaskan North Slope and the Canadian Shield region were exposed to elevated BrO, whereas the surrounding mountains isolated the Alaskan interior from bromine intrusion

    Decoding the Molecular Universe -- Workshop Report

    Full text link
    On August 9-10, 2023, a workshop was convened at the Pacific Northwest National Laboratory (PNNL) in Richland, WA that brought together a group of internationally recognized experts in metabolomics, natural products discovery, chemical ecology, chemical and biological threat assessment, cheminformatics, computational chemistry, cloud computing, artificial intelligence, and novel technology development. These experts were invited to assess the value and feasibility of a grand-scale project to create new technologies that would allow the identification and quantification of all small molecules, or to decode the molecular universe. The Decoding the Molecular Universe project would extend and complement the success of the Human Genome Project by developing new capabilities and technologies to measure small molecules (defined as non-protein, non-polymer molecules less than 1500 Daltons) of any origin and generated in biological systems or produced abiotically. Workshop attendees 1) explored what new understanding of biological and environmental systems could be revealed through the lens of small molecules; 2) characterized the similarities in current needs and technical challenges between each science or mission area for unambiguous and comprehensive determination of the composition and quantities of small molecules of any sample; 3) determined the extent to which technologies or methods currently exist for unambiguously and comprehensively determining the small molecule composition of any sample and in a reasonable time; and 4) identified the attributes of the ideal technology or approach for universal small molecule measurement and identification. The workshop concluded with a discussion of how a project of this scale could be undertaken, possible thrusts for the project, early proof-of-principle applications, and similar efforts upon which the project could be modeled
    corecore