34 research outputs found
Data structures and compression algorithms for high-throughput sequencing technologies
<p>Abstract</p> <p>Background</p> <p>High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data.</p> <p>Results</p> <p>We develop data structures and compression algorithms for HTS data. A processing stage maps short sequences to a reference genome or a large table of sequences. Then the integers representing the short sequence absolute or relative addresses, their length, and the substitutions they may contain are compressed and stored using various entropy coding algorithms, including both old and new fixed codes (e.g Golomb, Elias Gamma, MOV) and variable codes (e.g. Huffman). The general methodology is illustrated and applied to several HTS data sets. Results show that the information contained in HTS files can be compressed by a factor of 10 or more, depending on the statistical properties of the data sets and various other choices and constraints. Our algorithms fair well against general purpose compression programs such as gzip, bzip2 and 7zip; timing results show that our algorithms are consistently faster than the best general purpose compression programs.</p> <p>Conclusions</p> <p>It is not likely that exactly one encoding strategy will be optimal for all types of HTS data. Different experimental conditions are going to generate various data distributions whereby one encoding strategy can be more effective than another. We have implemented some of our encoding algorithms into the software package GenCompress which is available upon request from the authors. With the advent of HTS technology and increasingly new experimental protocols for using the technology, sequence databases are expected to continue rising in size. The methodology we have proposed is general, and these advanced compression techniques should allow researchers to manage and share their HTS data in a more timely fashion.</p
MotifMap: integrative genome-wide maps of regulatory motif sites for model species
<p>Abstract</p> <p>Background</p> <p>A central challenge of biology is to map and understand gene regulation on a genome-wide scale. For any given genome, only a small fraction of the regulatory elements embedded in the DNA sequence have been characterized, and there is great interest in developing computational methods to systematically map all these elements and understand their relationships. Such computational efforts, however, are significantly hindered by the overwhelming size of non-coding regions and the statistical variability and complex spatial organizations of regulatory elements and interactions. Genome-wide catalogs of regulatory elements for all model species simply do not yet exist.</p> <p>Results</p> <p>The MotifMap system uses databases of transcription factor binding motifs, refined genome alignments, and a comparative genomic statistical approach to provide comprehensive maps of candidate regulatory elements encoded in the genomes of model species. The system is used to derive new genome-wide maps for yeast, fly, worm, mouse, and human. The human map contains 519,108 sites for 570 matrices with a False Discovery Rate of 0.1 or less. The new maps are assessed in several ways, for instance using high-throughput experimental ChIP-seq data and AUC statistics, providing strong evidence for their accuracy and coverage. The maps can be usefully integrated with many other kinds of omic data and are available at <url>http://motifmap.igb.uci.edu/</url>.</p> <p>Conclusions</p> <p>MotifMap and its integration with other data provide a foundation for analyzing gene regulation on a genome-wide scale, and for automatically generating regulatory pathways and hypotheses. The power of this approach is demonstrated and discussed using the P53 apoptotic pathway and the Gli hedgehog pathways as examples.</p
Increased fluxes of shelf-derived materials to the central Arctic Ocean
© The Author(s), 2018. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Science Advances 4 (2018): eaao1302, doi:10.1126/sciadv.aao1302.Rising temperatures in the Arctic Ocean region are responsible for changes such as reduced ice cover, permafrost thawing, and increased river discharge, which, together, alter nutrient and carbon cycles over the vast Arctic continental shelf. We show that the concentration of radium-228, sourced to seawater through sediment-water exchange processes, has increased substantially in surface waters of the central Arctic Ocean over the past decade. A mass balance model for 228Ra suggests that this increase is due to an intensification of shelf-derived material inputs to the central basin, a source that would also carry elevated concentrations of dissolved organic carbon and nutrients. Therefore, we suggest that significant changes in the nutrient, carbon, and trace metal balances of the Arctic Ocean are underway, with the potential to affect biological productivity and species assemblages in Arctic surface waters.This work was funded by NSF awards OCE-1458305 to M.A.C. and
OCE-1458424 to W.S.M. The Mackenzie River sampling was supported by a Graduate Student
Research Award from the North Pacific Research Board to L.E.K. L.E.K. also acknowledges
support from a National Defense Science and Engineering Graduate Fellowship. I.G.R.
acknowledges funding by the contributors to the U.S. Interagency Arctic Buoy Program, which
include the U.S. Coast Guard, the Department of Energy, NASA, the U.S. Navy, the National
Oceanic and Atmospheric Administration, and NSF
Retrotransposon profiling of RNA polymerase III initiation sites
Although retroviruses are relatively promiscuous in choice of integration sites, retrotransposons can display marked integration specificity. In yeast and slime mold, some retrotransposons are associated with tRNA genes (tDNAs). In the Saccharomyces cerevisiae genome, the long terminal repeat retrotransposon Ty3 is found at RNA polymerase III (Pol III) transcription start sites of tDNAs. Ty1, 2, and 4 elements also cluster in the upstream regions of these genes. To determine the extent to which other Pol III-transcribed genes serve as genomic targets for Ty3, a set of 10,000 Ty3 genomic retrotranspositions were mapped using high-throughput DNA sequencing. Integrations occurred at all known tDNAs, two tDNA relics (iYGR033c and ZOD1), and six non-tDNA, Pol III-transcribed types of genes (RDN5, SNR6, SNR52, RPR1, RNA170, and SCR1). Previous work in vitro demonstrated that the Pol III transcription factor (TF) IIIB is important for Ty3 targeting. However, seven loci that bind the TFIIIB loader, TFIIIC, were not targeted, underscoring the unexplained absence of TFIIIB at those sites. Ty3 integrations also occurred in two open reading frames not previously associated with Pol III transcription, suggesting the existence of a small number of additional sites in the yeast genome that interact with Pol III transcription complexes
S K S Splitting beneath Continental Rift Zones
We present measurements of S K S splitting at 28 digital seismic stations and 35 analog stations in the Baikal rift zone, Siberia, and adjacent areas, and at 17 stations in the East African Rift in Kenya and compare them with previous measurements from the Rio Grande Rift of North America. Fast directions in the inner region of the Baikal rift zone are distributed in two orthogonal directions, NE and NW, approximately parallel and perpendicular to the NE strike of the rift. In the adjacent Siberian platform and northern Mongolian fold belt, only the rift-orthogonal fast direction is observed. In southcentral Mongolia, the dominant fast direction changes to rift-parallel again, although a small number of measurements are still rift-orthogonal. For the axial zones of the East African and Rio Grande Rifts, fast directions are oriented on average NNE, that is, rotated clockwise from the N-S trending rift. All three rifts are underlain by low-velocity upper mantle as determined from teleseismic tomography. Rift-related mantle flow provides a plausible interpretation for the rift-orthogonal fast directions. The rift-parallel fast directions near the rift axes can be interpreted by oriented magmatic cracks in the mantle or small-scale mantle convection with rift-parallel flow. The agreement between stress estimates and corresponding crack orientations lends some weight to the suggestion that the rift-parallel fast directions are caused by oriented magmatic cracks
Reply [to “Comment on “SKS Splitting beneath Continental Rifts Zones” by Gao et al.”]
Vauchez et al. [this issue] (hereinafter refered to as VBN) interpret the petrologic, tomographic, and anisotropy data from continental rifts to support a model of continental rifting [Nicolas, 1993; Nicolas et al., 1994] in which the lithosphere splits along the rift axis and asthenosphere flows in from the sides to fill the resulting gap. We suggest here that the data can also be described by a model in which the lower lithosphere is modified or eroded by active mantle upwelling over a region of significantly greater dimensions than the rift graben and that partial melt developing in the upwelling region can account for the widespread volcanism, as well as the seismic properties. Nicolas [1993] argued that rift-aligned anisotropy could be explained by rift-parallel mantle flow. We thank VBN for bringing this relevant paper to our attention.
Volcanism about the East African Rift and the Rio Grande is not confined to the rifts but extends hundreds of kilometers from the rift axes (Mount Kilimanjaro, Mount Elgon, Mount Kenya in East Africa, The Jemez Lineament on the Rio Grande) in regions uplifted relative to their surroundings. The low-velocity tomographic anomalies also extend beneath the uplifted regions and are thought to be related to the uplift possibly supporting it by thermostatic buoyancy. The size of the P and S velocity contrasts and attenuation of high frequencies have led to the suggestion that large regions of the anomalous bodies have temperatures at or above the solidus [Achauer et al, 1994; Slack et al., 1994, 1996]. The wide extent of the anomalous regions is not explicable as resulting from an abyssal lithospheric dike beneath the rift intruded by asthenosphere. The extension of the East African, Baikal, and Rio Grande rift grabens has been estimated to be about 10 km [Baker et al., 1972; Baldridge et al., 1984; Morgan and Golombek, 1984; Logatchev and Florensov, 1978]. Passive influx of asthenosphere into a 10 km lithospheric dike is insufficient to explain the tomographic anomalies [Davis, 1991]. In addition, the amount of finite strain from lithospheric diking is insufficient to explain the anisotropy anomalies. Active replacement or modification of lower lithosphere either prior to, or contemporaneous with, rifting could generate tomographic anomalies of this magnitude
Remote Sensing of Antarctic Sea Ice with Coordinated Aircraft and Satellite Data Acquisitions
Remote sensing of Antarctic sea ice is required to characterize properties of the vast sea ice cover to understand its long-term increase in contrast to the decrease of Arctic sea ice. For this objective, the OIB/TanDEM-X Coordinated Science Campaign (OTASC) was successfully conducted in 2017 to obtain contemporaneous and collocated remote sensing data from NASA's Operation IceBridge (OIB) and the German Aerospace Center (DLR) TanDEM-X Synthetic Aperture Radar (SAR) system at X-band together with Sentinel-1 and RADARSAT-2 SARs at C-band in conjunction with WorldView satellite spectral sensors, surface measurements, and field observations. The Weddell Sea and the Ross Sea were two primary regions while SAR data were also collected over six other regions in the Southern Ocean. Satellite SAR data included both polarimetric and interferometric capabilities to infer snow and sea ice information in three dimensions (3D), while OIB/P-3 aircraft data include snow radar together with altimeter data for snow and sea ice observations in 3D over the Weddell Sea. Across the Ross Sea, IcePOD and AntNZ/York-University flights were carried out together with satellite SAR data acquisitions
Field and Satellite Observations of the Formation and Distribution of Arctic Atmospheric Bromine Above a Rejuvenated Sea Ice Cover
Recent drastic reduction of the older perennial sea ice in the Arctic Ocean has resulted in a vast expansion of younger and saltier seasonal sea ice. This increase in the salinity of the overall ice cover could impact tropospheric chemical processes. Springtime perennial ice extent in 2008 and 2009 broke the half-century record minimum in 2007 by about one million km2. In both years seasonal ice was dominant across the Beaufort Sea extending to the Amundsen Gulf, where significant field and satellite observations of sea ice, temperature, and atmospheric chemicals have been made. Measurements at the site of the Canadian Coast Guard Ship Amundsen ice breaker in the Amundsen Gulf showed events of increased bromine monoxide (BrO), coupled with decreases of ozone (O3) and gaseous elemental mercury (GEM), during cold periods in March 2008. The timing of the main event of BrO, O3, and GEM changes was found to be consistent with BrO observed by satellites over an extensive area around the site. Furthermore, satellite sensors detected a doubling of atmospheric BrO in a vortex associated with a spiral rising air pattern. In spring 2009, excessive and widespread bromine explosions occurred in the same region while the regional air temperature was low and the extent of perennial ice was significantly reduced compared to the case in 2008. Using satellite observations together with a Rising-Air-Parcel model, we discover a topographic control on BrO distribution such that the Alaskan North Slope and the Canadian Shield region were exposed to elevated BrO, whereas the surrounding mountains isolated the Alaskan interior from bromine intrusion
Recommended from our members
Consistent and contrasting decadal Arctic sea ice thickness predictions from a highly optimized sea ice model
[1] Decadal hindcast simulations of Arctic Ocean sea ice thickness made by a modern dynamic-thermodynamic sea ice model and forced independently by both the ERA-40 and NCEP/NCAR reanalysis data sets are compared for the first time. Using comprehensive data sets of observations made between 1979 and 2001 of sea ice thickness, draft, extent, and speeds, we find that it is possible to tune model parameters to give satisfactory agreement with observed data, thereby highlighting the skill of modern sea ice models, though the parameter values chosen differ according to the model forcing used. We find a consistent decreasing trend in Arctic Ocean sea ice thickness since 1979, and a steady decline in the Eastern Arctic Ocean over the full 40-year period of comparison that accelerated after 1980, but the predictions of Western Arctic Ocean sea ice thickness between 1962 and 1980 differ substantially. The origins of differing thickness trends and variability were isolated not to parameter differences but to differences in the forcing fields applied, and in how they are applied. It is argued that uncertainty, differences and errors in sea ice model forcing sets complicate the use of models to determine the exact causes of the recently reported decline in Arctic sea ice thickness, but help in the determination of robust features if the models are tuned appropriately against observations
Decoding the Molecular Universe -- Workshop Report
On August 9-10, 2023, a workshop was convened at the Pacific Northwest
National Laboratory (PNNL) in Richland, WA that brought together a group of
internationally recognized experts in metabolomics, natural products discovery,
chemical ecology, chemical and biological threat assessment, cheminformatics,
computational chemistry, cloud computing, artificial intelligence, and novel
technology development. These experts were invited to assess the value and
feasibility of a grand-scale project to create new technologies that would
allow the identification and quantification of all small molecules, or to
decode the molecular universe. The Decoding the Molecular Universe project
would extend and complement the success of the Human Genome Project by
developing new capabilities and technologies to measure small molecules
(defined as non-protein, non-polymer molecules less than 1500 Daltons) of any
origin and generated in biological systems or produced abiotically. Workshop
attendees 1) explored what new understanding of biological and environmental
systems could be revealed through the lens of small molecules; 2) characterized
the similarities in current needs and technical challenges between each science
or mission area for unambiguous and comprehensive determination of the
composition and quantities of small molecules of any sample; 3) determined the
extent to which technologies or methods currently exist for unambiguously and
comprehensively determining the small molecule composition of any sample and in
a reasonable time; and 4) identified the attributes of the ideal technology or
approach for universal small molecule measurement and identification. The
workshop concluded with a discussion of how a project of this scale could be
undertaken, possible thrusts for the project, early proof-of-principle
applications, and similar efforts upon which the project could be modeled