Search CORE

34 research outputs found

Data structures and compression algorithms for high-throughput sequencing technologies

Author: Baldi Pierre
Christley Scott
Daily Kenny
Rigor Paul
Xie Xiaohui
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data. Results We develop data structures and compression algorithms for HTS data. A processing stage maps short sequences to a reference genome or a large table of sequences. Then the integers representing the short sequence absolute or relative addresses, their length, and the substitutions they may contain are compressed and stored using various entropy coding algorithms, including both old and new fixed codes (e.g Golomb, Elias Gamma, MOV) and variable codes (e.g. Huffman). The general methodology is illustrated and applied to several HTS data sets. Results show that the information contained in HTS files can be compressed by a factor of 10 or more, depending on the statistical properties of the data sets and various other choices and constraints. Our algorithms fair well against general purpose compression programs such as gzip, bzip2 and 7zip; timing results show that our algorithms are consistently faster than the best general purpose compression programs. Conclusions It is not likely that exactly one encoding strategy will be optimal for all types of HTS data. Different experimental conditions are going to generate various data distributions whereby one encoding strategy can be more effective than another. We have implemented some of our encoding algorithms into the software package GenCompress which is available upon request from the authors. With the advent of HTS technology and increasingly new experimental protocols for using the technology, sequence databases are expected to continue rising in size. The methodology we have proposed is general, and these advanced compression techniques should allow researchers to manage and share their HTS data in a more timely fashion.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

MotifMap: integrative genome-wide maps of regulatory motif sites for model species

Author: Baldi Pierre
Daily Kenneth
Patel Vishal R
Rigor Paul
Xie Xiaohui
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A central challenge of biology is to map and understand gene regulation on a genome-wide scale. For any given genome, only a small fraction of the regulatory elements embedded in the DNA sequence have been characterized, and there is great interest in developing computational methods to systematically map all these elements and understand their relationships. Such computational efforts, however, are significantly hindered by the overwhelming size of non-coding regions and the statistical variability and complex spatial organizations of regulatory elements and interactions. Genome-wide catalogs of regulatory elements for all model species simply do not yet exist. Results The MotifMap system uses databases of transcription factor binding motifs, refined genome alignments, and a comparative genomic statistical approach to provide comprehensive maps of candidate regulatory elements encoded in the genomes of model species. The system is used to derive new genome-wide maps for yeast, fly, worm, mouse, and human. The human map contains 519,108 sites for 570 matrices with a False Discovery Rate of 0.1 or less. The new maps are assessed in several ways, for instance using high-throughput experimental ChIP-seq data and AUC statistics, providing strong evidence for their accuracy and coverage. The maps can be usefully integrated with many other kinds of omic data and are available at <url>http://motifmap.igb.uci.edu/</url>. Conclusions MotifMap and its integration with other data provide a foundation for analyzing gene regulation on a genome-wide scale, and for automatically generating regulatory pathways and hypotheses. The power of this approach is demonstrated and discussed using the P53 apoptotic pathway and the Gli hedgehog pathways as examples.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Increased fluxes of shelf-derived materials to the central Arctic Ocean

Author: Charette Matthew A.
Henderson Paul B.
Kipp Lauren
Moore Willard S.
Rigor Ignatius
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 03/01/2018
Field of study

© The Author(s), 2018. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Science Advances 4 (2018): eaao1302, doi:10.1126/sciadv.aao1302.Rising temperatures in the Arctic Ocean region are responsible for changes such as reduced ice cover, permafrost thawing, and increased river discharge, which, together, alter nutrient and carbon cycles over the vast Arctic continental shelf. We show that the concentration of radium-228, sourced to seawater through sediment-water exchange processes, has increased substantially in surface waters of the central Arctic Ocean over the past decade. A mass balance model for 228Ra suggests that this increase is due to an intensification of shelf-derived material inputs to the central basin, a source that would also carry elevated concentrations of dissolved organic carbon and nutrients. Therefore, we suggest that significant changes in the nutrient, carbon, and trace metal balances of the Arctic Ocean are underway, with the potential to affect biological productivity and species assemblages in Arctic surface waters.This work was funded by NSF awards OCE-1458305 to M.A.C. and OCE-1458424 to W.S.M. The Mackenzie River sampling was supported by a Graduate Student Research Award from the North Pacific Research Board to L.E.K. L.E.K. also acknowledges support from a National Defense Science and Engineering Graduate Fellowship. I.G.R. acknowledges funding by the contributors to the U.S. Interagency Arctic Buoy Program, which include the U.S. Coast Guard, the Department of Energy, NASA, the U.S. Navy, the National Oceanic and Atmospheric Administration, and NSF

Woods Hole Open Access Server

Rowan University

Retrotransposon profiling of RNA polymerase III initiation sites

Author: Baldi Pierre
Daily Kenneth
Forouzan Sholeh
Johnston Mark
Mayhew David
Mitra Robi David
Nguyen Kim
Qi Xiaojie
Rigor Paul
Sandmeyer Suzanne
Wang Haoyi
Publication venue: Digital Commons@Becker
Publication date: 01/01/2012
Field of study

Although retroviruses are relatively promiscuous in choice of integration sites, retrotransposons can display marked integration specificity. In yeast and slime mold, some retrotransposons are associated with tRNA genes (tDNAs). In the Saccharomyces cerevisiae genome, the long terminal repeat retrotransposon Ty3 is found at RNA polymerase III (Pol III) transcription start sites of tDNAs. Ty1, 2, and 4 elements also cluster in the upstream regions of these genes. To determine the extent to which other Pol III-transcribed genes serve as genomic targets for Ty3, a set of 10,000 Ty3 genomic retrotranspositions were mapped using high-throughput DNA sequencing. Integrations occurred at all known tDNAs, two tDNA relics (iYGR033c and ZOD1), and six non-tDNA, Pol III-transcribed types of genes (RDN5, SNR6, SNR52, RPR1, RNA170, and SCR1). Previous work in vitro demonstrated that the Pol III transcription factor (TF) IIIB is important for Ty3 targeting. However, seven loci that bind the TFIIIB loader, TFIIIC, were not targeted, underscoring the unexplained absence of TFIIIB at those sites. Ty3 integrations also occurred in two open reading frames not previously associated with Pol III transcription, suggesting the existence of a small number of additional sites in the yeast genome that interact with Pol III transcription complexes

Crossref

Digital Commons@Becker

PubMed Central

eScholarship - University of California

S K S Splitting beneath Continental Rift Zones

Author: Davis Paul M.
Gao Stephen S.
Kozhevnikov Vladimir M.
Liu Kelly H.
Logatchev Nikolai A.
Mordvinova Valentina V.
Rigor Andrew W.
Slack Philip D.
Zorin Yu A.
Publication venue: Scholars\u27 Mine
Publication date: 01/10/1997
Field of study

We present measurements of S K S splitting at 28 digital seismic stations and 35 analog stations in the Baikal rift zone, Siberia, and adjacent areas, and at 17 stations in the East African Rift in Kenya and compare them with previous measurements from the Rio Grande Rift of North America. Fast directions in the inner region of the Baikal rift zone are distributed in two orthogonal directions, NE and NW, approximately parallel and perpendicular to the NE strike of the rift. In the adjacent Siberian platform and northern Mongolian fold belt, only the rift-orthogonal fast direction is observed. In southcentral Mongolia, the dominant fast direction changes to rift-parallel again, although a small number of measurements are still rift-orthogonal. For the axial zones of the East African and Rio Grande Rifts, fast directions are oriented on average NNE, that is, rotated clockwise from the N-S trending rift. All three rifts are underlain by low-velocity upper mantle as determined from teleseismic tomography. Rift-related mantle flow provides a plausible interpretation for the rift-orthogonal fast directions. The rift-parallel fast directions near the rift axes can be interpreted by oriented magmatic cracks in the mantle or small-scale mantle convection with rift-parallel flow. The agreement between stress estimates and corresponding crack orientations lends some weight to the suggestion that the rift-parallel fast directions are caused by oriented magmatic cracks

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Reply [to “Comment on “SKS Splitting beneath Continental Rifts Zones” by Gao et al.”]

Author: Davis Paul M.
Gao Stephen S.
Kozhevnikov Vladimir M.
Liu Kelly H.
Logatchev Nikolai A.
Mordvinova Valentina V.
Rigor Andrew W.
Slack Philip D.
Zorin Yuliy A.
Publication venue: Scholars\u27 Mine
Publication date: 01/05/1999
Field of study

Vauchez et al. [this issue] (hereinafter refered to as VBN) interpret the petrologic, tomographic, and anisotropy data from continental rifts to support a model of continental rifting [Nicolas, 1993; Nicolas et al., 1994] in which the lithosphere splits along the rift axis and asthenosphere flows in from the sides to fill the resulting gap. We suggest here that the data can also be described by a model in which the lower lithosphere is modified or eroded by active mantle upwelling over a region of significantly greater dimensions than the rift graben and that partial melt developing in the upwelling region can account for the widespread volcanism, as well as the seismic properties. Nicolas [1993] argued that rift-aligned anisotropy could be explained by rift-parallel mantle flow. We thank VBN for bringing this relevant paper to our attention. Volcanism about the East African Rift and the Rio Grande is not confined to the rifts but extends hundreds of kilometers from the rift axes (Mount Kilimanjaro, Mount Elgon, Mount Kenya in East Africa, The Jemez Lineament on the Rio Grande) in regions uplifted relative to their surroundings. The low-velocity tomographic anomalies also extend beneath the uplifted regions and are thought to be related to the uplift possibly supporting it by thermostatic buoyancy. The size of the P and S velocity contrasts and attenuation of high frequencies have led to the suggestion that large regions of the anomalous bodies have temperatures at or above the solidus [Achauer et al, 1994; Slack et al., 1994, 1996]. The wide extent of the anomalous regions is not explicable as resulting from an abyssal lithospheric dike beneath the rift intruded by asthenosphere. The extension of the East African, Baikal, and Rio Grande rift grabens has been estimated to be about 10 km [Baker et al., 1972; Baldridge et al., 1984; Morgan and Golombek, 1984; Logatchev and Florensov, 1978]. Passive influx of asthenosphere into a 10 km lithospheric dike is insufficient to explain the tomographic anomalies [Davis, 1991]. In addition, the amount of finite strain from lithospheric diking is insufficient to explain the anisotropy anomalies. Active replacement or modification of lower lithosphere either prior to, or contemporaneous with, rifting could generate tomographic anomalies of this magnitude

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Remote Sensing of Antarctic Sea Ice with Coordinated Aircraft and Satellite Data Acquisitions

Author: Ackley Stephen
Bachmann Markus
Busche Thomas Edmund
Haas Christian
Kraus Thomas
Kurtz Nathan
Langhorne Pat
Maksym Ted
Morin Paul
Neumann Gregory
Nghiem Son
Nguyen Lisa
Panowicz Caryn
Rack Wolfgang
Rigor Ignatius
Sonntag John
Tinto Kirsteen
Woods John
Xie Hongjie
Publication venue
Publication date: 22/07/2018
Field of study

Remote sensing of Antarctic sea ice is required to characterize properties of the vast sea ice cover to understand its long-term increase in contrast to the decrease of Arctic sea ice. For this objective, the OIB/TanDEM-X Coordinated Science Campaign (OTASC) was successfully conducted in 2017 to obtain contemporaneous and collocated remote sensing data from NASA's Operation IceBridge (OIB) and the German Aerospace Center (DLR) TanDEM-X Synthetic Aperture Radar (SAR) system at X-band together with Sentinel-1 and RADARSAT-2 SARs at C-band in conjunction with WorldView satellite spectral sensors, surface measurements, and field observations. The Weddell Sea and the Ross Sea were two primary regions while SAR data were also collected over six other regions in the Southern Ocean. Satellite SAR data included both polarimetric and interferometric capabilities to infer snow and sea ice information in three dimensions (3D), while OIB/P-3 aircraft data include snow radar together with altimeter data for snow and sea ice observations in 3D over the Weddell Sea. Across the Ross Sea, IcePOD and AntNZ/York-University flights were carried out together with satellite SAR data acquisitions

Institute of Transport Research:Publications

Crossref

NASA Technical Reports Server

Field and Satellite Observations of the Formation and Distribution of Arctic Atmospheric Bromine Above a Rejuvenated Sea Ice Cover

Author: Asplin Matthew G.
Barber David G.
Bottenheim Jan
Burrows John P.
Clemente-Colon Pablo
Hall Dorothy K.
Kaleschke Lars
Latonas Jeff
Martin Seelye
Neumann Gregory
Nghiem Son V.
Richter Andreas
Rigor Ignatius G.
Shepson Paul B.
Steffen Alexandra
Stern Gary
Tackett Philip
Wang Feiyue
Publication venue
Publication date: 01/01/2012
Field of study

Recent drastic reduction of the older perennial sea ice in the Arctic Ocean has resulted in a vast expansion of younger and saltier seasonal sea ice. This increase in the salinity of the overall ice cover could impact tropospheric chemical processes. Springtime perennial ice extent in 2008 and 2009 broke the half-century record minimum in 2007 by about one million km2. In both years seasonal ice was dominant across the Beaufort Sea extending to the Amundsen Gulf, where significant field and satellite observations of sea ice, temperature, and atmospheric chemicals have been made. Measurements at the site of the Canadian Coast Guard Ship Amundsen ice breaker in the Amundsen Gulf showed events of increased bromine monoxide (BrO), coupled with decreases of ozone (O3) and gaseous elemental mercury (GEM), during cold periods in March 2008. The timing of the main event of BrO, O3, and GEM changes was found to be consistent with BrO observed by satellites over an extensive area around the site. Furthermore, satellite sensors detected a doubling of atmospheric BrO in a vortex associated with a spiral rising air pattern. In spring 2009, excessive and widespread bromine explosions occurred in the same region while the regional air temperature was low and the extent of perennial ice was significantly reduced compared to the case in 2008. Using satellite observations together with a Rising-Air-Parcel model, we discover a topographic control on BrO distribution such that the Alaskan North Slope and the Canadian Shield region were exposed to elevated BrO, whereas the surrounding mountains isolated the Alaskan interior from bromine intrusion

NASA Technical Reports Server

MPG.PuRe

Recommended from our members

Consistent and contrasting decadal Arctic sea ice thickness predictions from a highly optimized sea ice model

Author: Arctic Climate Impact Assessment
Bitz
Bromwich
Bromwich
Cavalieri
Comiso
Curry
Daniel L. Feltham
Flato
Fowler
Hibler
Hibler
Hilmer
Hunke
Hunke
Hunke
Jordan
Kalnay
Köberle
Laxon
Lindsay
Liu
Miller
Miller
Parkinson
Parkinson
Paul A. Miller
Perovich
Rigor
Rothrock
Rothrock
Rothrock
Seymour W. Laxon
Simmons
Smith
Stroeve
Thorndike
Vowinckel
Wensnahan
Wilchinsky
Yu
Zhang
Zhang
Zhang
Zhang
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 01/01/2007
Field of study

[1] Decadal hindcast simulations of Arctic Ocean sea ice thickness made by a modern dynamic-thermodynamic sea ice model and forced independently by both the ERA-40 and NCEP/NCAR reanalysis data sets are compared for the first time. Using comprehensive data sets of observations made between 1979 and 2001 of sea ice thickness, draft, extent, and speeds, we find that it is possible to tune model parameters to give satisfactory agreement with observed data, thereby highlighting the skill of modern sea ice models, though the parameter values chosen differ according to the model forcing used. We find a consistent decreasing trend in Arctic Ocean sea ice thickness since 1979, and a steady decline in the Eastern Arctic Ocean over the full 40-year period of comparison that accelerated after 1980, but the predictions of Western Arctic Ocean sea ice thickness between 1962 and 1980 differ substantially. The origins of differing thickness trends and variability were isolated not to parameter differences but to differences in the forcing fields applied, and in how they are applied. It is argued that uncertainty, differences and errors in sea ice model forcing sets complicate the use of models to determine the exact causes of the recently reported decline in Arctic sea ice thickness, but help in the determination of robust features if the models are tuned appropriately against observations

Central Archive at the University of Reading

Lund University Publications

Crossref

NERC Open Research Archive

Decoding the Molecular Universe -- Workshop Report

On August 9-10, 2023, a workshop was convened at the Pacific Northwest National Laboratory (PNNL) in Richland, WA that brought together a group of internationally recognized experts in metabolomics, natural products discovery, chemical ecology, chemical and biological threat assessment, cheminformatics, computational chemistry, cloud computing, artificial intelligence, and novel technology development. These experts were invited to assess the value and feasibility of a grand-scale project to create new technologies that would allow the identification and quantification of all small molecules, or to decode the molecular universe. The Decoding the Molecular Universe project would extend and complement the success of the Human Genome Project by developing new capabilities and technologies to measure small molecules (defined as non-protein, non-polymer molecules less than 1500 Daltons) of any origin and generated in biological systems or produced abiotically. Workshop attendees 1) explored what new understanding of biological and environmental systems could be revealed through the lens of small molecules; 2) characterized the similarities in current needs and technical challenges between each science or mission area for unambiguous and comprehensive determination of the composition and quantities of small molecules of any sample; 3) determined the extent to which technologies or methods currently exist for unambiguously and comprehensively determining the small molecule composition of any sample and in a reasonable time; and 4) identified the attributes of the ideal technology or approach for universal small molecule measurement and identification. The workshop concluded with a discussion of how a project of this scale could be undertaken, possible thrusts for the project, early proof-of-principle applications, and similar efforts upon which the project could be modeled

arXiv.org e-Print Archive