66 research outputs found
Integration and visualization of systems biology data in context of the genome
<p>Abstract</p> <p>Background</p> <p>High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment.</p> <p>Results</p> <p>The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data.</p> <p>A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome.</p> <p>Conclusions</p> <p>Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.</p
The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications
<p>Abstract</p> <p>Background</p> <p>Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data.</p> <p>Results</p> <p>The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks.</p> <p>Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents.</p> <p>We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe <it>Halobacterium salinarum NRC-1 </it>in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved.</p> <p>Conclusion</p> <p>The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools.</p
Accurate Crystal Structure Prediction of New 2D Hybrid Organic Inorganic Perovskites
Low dimensional hybrid organic-inorganic perovskites (HOIPs) represent a
promising class of electronically active materials for both light absorption
and emission. The design space of HOIPs is extremely large, since a diverse
space of organic cations can be combined with different inorganic frameworks.
This immense design space allows for tunable electronic and mechanical
properties, but also necessitates the development of new tools for in silico
high throughput analysis of candidate structures. In this work, we present an
accurate, efficient, transferable and widely applicable machine learning
interatomic potential (MLIP) for predicting the structure of new 2D HOIPs.
Using the MACE architecture, an MLIP is trained on 86 diverse experimentally
reported HOIP structures. The model is tested on 73 unseen perovskite
compositions, and achieves chemical accuracy with respect to the reference
electronic structure method. Our model is then combined with a simple random
structure search algorithm to predict the structure of hypothetical HOIPs given
only the proposed composition. Success is demonstrated by correctly and
reliably recovering the crystal structure of a set of experimentally known 2D
perovskites. Such a random structure search is impossible with ab initio
methods due to the associated computational cost, but is relatively inexpensive
with the MACE potential. Finally, the procedure is used to predict the
structure formed by a new organic cation with no previously known corresponding
perovskite. Laboratory synthesis of the new hybrid perovskite confirms the
accuracy of our prediction. This capability, applied at scale, enables
efficient screening of thousands of combinations of organic cations and
inorganic layers.Comment: 14 pages and 9 figures in the main text. Supplementary included in
pd
Leveraging Domain Adaptation for Accurate Machine Learning Predictions of New Halide Perovskites
We combine graph neural networks (GNN) with an inexpensive and reliable
structure generation approach based on the bond-valence method (BVM) to train
accurate machine learning models for screening 222,960 halide perovskites using
statistical estimates of the DFT/PBE formation energy (Ef), and the PBE and HSE
band gaps (Eg). The GNNs were fined tuned using domain adaptation (DA) from a
source model, which yields a factor of 1.8 times improvement in Ef and 1.2 -
1.35 times improvement in HSE Eg compared to direct training (i.e., without
DA). Using these two ML models, 48 compounds were identified out of 222,960
candidates as both stable and that have an HSE Eg that is relevant for
photovoltaic applications. For this subset, only 8 have been reported to date,
indicating that 40 compounds remain unexplored to the best of our knowledge and
therefore offer opportunities for potential experimental examination
Niche adaptation by expansion and reprogramming of general transcription factors
Experimental analysis of TFB family proteins in a halophilic archaeon reveals complex environment-dependent fitness contributions. Gene conversion events among these proteins can generate novel niche adaptation capabilities, a process that may have contributed to archaeal adaptation to extreme environments
Genetic variants in the KIF6 region and coronary event reduction from statin therapy
A single nucleotide polymorphism (SNP) in KIF6, a member of the KIF9 family of kinesins, is associated with differential coronary event reduction from statin therapy in four randomized controlled trials; this SNP (rs20455) is also associated with the risk for coronary heart disease (CHD) in multiple prospective studies. We investigated whether other common SNPs in the KIF6 region were associated with event reduction from statin therapy. Of the 170 SNPs in the KIF6 region investigated in the Cholesterol and Recurrent Events trial (CARE), 28 were associated with differential event reduction from statin therapy (Pinteraction < 0.1 in Caucasians, adjusted for age and sex) and were further investigated in the Pravastatin or Atorvastatin Evaluation and Infection Therapy-Thrombolysis In Myocardial Infarction 22 (PROVE IT-TIMI22) and West of Scotland Coronary Prevention Study (WOSCOPS). These analyses revealed that two SNPs (rs9462535 and rs9471077), in addition to rs20455, were associated with event reduction from statin therapy (Pinteraction < 0.1 in each of the three studies). The relative risk reduction ranged from 37 to 50% (P < 0.01) in carriers of the minor alleles of these SNPs and from −4 to 13% (P > 0.4) in non-carriers. These three SNPs are in high linkage disequilibrium with one another (r2 > 0.84). Functional studies of these variants may help to understand the role of KIF6 in the pathogenesis of CHD and differential response to statin therapy
Prevalence of transcription promoters within archaeal operons and coding sequences
Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ∼64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements
Motion for a Resolution tabled by Mr Richard Balfe for entry in the register pursuant to Rule 49 of the Rules of Procedure on supply of military equipment to states where basic human rights are not respected. Working Documents 1982-83, Document 1-265/82, 19 May 1982
Abstract Background The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called “germline leakage”. The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. Results The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. Conclusions The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software
Astrometry and geodesy with radio interferometry: experiments, models, results
Summarizes current status of radio interferometry at radio frequencies
between Earth-based receivers, for astrometric and geodetic applications.
Emphasizes theoretical models of VLBI observables that are required to extract
results at the present accuracy levels of 1 cm and 1 nanoradian. Highlights the
achievements of VLBI during the past two decades in reference frames, Earth
orientation, atmospheric effects on microwave propagation, and relativity.Comment: 83 pages, 19 Postscript figures. To be published in Rev. Mod. Phys.,
Vol. 70, Oct. 199
Recommended from our members
A US perspective on closing the carbon cycle to defossilize difficult-to-electrify segments of our economy
Electrification to reduce or eliminate greenhouse gas emissions is essential to mitigate climate change. However, a substantial portion of our manufacturing and transportation infrastructure will be difficult to electrify and/or will continue to use carbon as a key component, including areas in aviation, heavy-duty and marine transportation, and the chemical industry. In this Roadmap, we explore how multidisciplinary approaches will enable us to close the carbon cycle and create a circular economy by defossilizing these difficult-to-electrify areas and those that will continue to need carbon. We discuss two approaches for this: developing carbon alternatives and improving our ability to reuse carbon, enabled by separations. Furthermore, we posit that co-design and use-driven fundamental science are essential to reach aggressive greenhouse gas reduction targets
- …