90 research outputs found
The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications
<p>Abstract</p> <p>Background</p> <p>Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data.</p> <p>Results</p> <p>The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks.</p> <p>Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents.</p> <p>We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe <it>Halobacterium salinarum NRC-1 </it>in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved.</p> <p>Conclusion</p> <p>The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools.</p
Integration and visualization of systems biology data in context of the genome
<p>Abstract</p> <p>Background</p> <p>High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment.</p> <p>Results</p> <p>The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data.</p> <p>A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome.</p> <p>Conclusions</p> <p>Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.</p
The mPower Study, Parkinson Disease Mobile Data Collected Using Researchkit
Current measures of health and disease are often insensitive, episodic, and subjective. Further, these measures generally are not designed to provide meaningful feedback to individuals. The impact of high-resolution activity data collected from mobile phones is only beginning to be explored. Here we present data from mPower, a clinical observational study about Parkinson disease conducted purely through an iPhone app interface. The study interrogated aspects of this movement disorder through surveys and frequent sensor-based recordings from participants with and without Parkinson disease. Benefitting from large enrollment and repeated measurements on many individuals, these data may help establish baseline variability of real-world activity measurement collected via mobile phones, and ultimately may lead to quantification of the ebbs-and-flows of Parkinson symptoms. App source code for these data collection modules are available through an open source license for use in studies of other conditions. We hope that releasing data contributed by engaged research participants will seed a new community of analysts working collaboratively on understanding mobile health data to advance human health
Accurate Crystal Structure Prediction of New 2D Hybrid Organic Inorganic Perovskites
Low dimensional hybrid organic-inorganic perovskites (HOIPs) represent a
promising class of electronically active materials for both light absorption
and emission. The design space of HOIPs is extremely large, since a diverse
space of organic cations can be combined with different inorganic frameworks.
This immense design space allows for tunable electronic and mechanical
properties, but also necessitates the development of new tools for in silico
high throughput analysis of candidate structures. In this work, we present an
accurate, efficient, transferable and widely applicable machine learning
interatomic potential (MLIP) for predicting the structure of new 2D HOIPs.
Using the MACE architecture, an MLIP is trained on 86 diverse experimentally
reported HOIP structures. The model is tested on 73 unseen perovskite
compositions, and achieves chemical accuracy with respect to the reference
electronic structure method. Our model is then combined with a simple random
structure search algorithm to predict the structure of hypothetical HOIPs given
only the proposed composition. Success is demonstrated by correctly and
reliably recovering the crystal structure of a set of experimentally known 2D
perovskites. Such a random structure search is impossible with ab initio
methods due to the associated computational cost, but is relatively inexpensive
with the MACE potential. Finally, the procedure is used to predict the
structure formed by a new organic cation with no previously known corresponding
perovskite. Laboratory synthesis of the new hybrid perovskite confirms the
accuracy of our prediction. This capability, applied at scale, enables
efficient screening of thousands of combinations of organic cations and
inorganic layers.Comment: 14 pages and 9 figures in the main text. Supplementary included in
pd
Leveraging Domain Adaptation for Accurate Machine Learning Predictions of New Halide Perovskites
We combine graph neural networks (GNN) with an inexpensive and reliable
structure generation approach based on the bond-valence method (BVM) to train
accurate machine learning models for screening 222,960 halide perovskites using
statistical estimates of the DFT/PBE formation energy (Ef), and the PBE and HSE
band gaps (Eg). The GNNs were fined tuned using domain adaptation (DA) from a
source model, which yields a factor of 1.8 times improvement in Ef and 1.2 -
1.35 times improvement in HSE Eg compared to direct training (i.e., without
DA). Using these two ML models, 48 compounds were identified out of 222,960
candidates as both stable and that have an HSE Eg that is relevant for
photovoltaic applications. For this subset, only 8 have been reported to date,
indicating that 40 compounds remain unexplored to the best of our knowledge and
therefore offer opportunities for potential experimental examination
Niche adaptation by expansion and reprogramming of general transcription factors
Experimental analysis of TFB family proteins in a halophilic archaeon reveals complex environment-dependent fitness contributions. Gene conversion events among these proteins can generate novel niche adaptation capabilities, a process that may have contributed to archaeal adaptation to extreme environments
Genetic variants in the KIF6 region and coronary event reduction from statin therapy
A single nucleotide polymorphism (SNP) in KIF6, a member of the KIF9 family of kinesins, is associated with differential coronary event reduction from statin therapy in four randomized controlled trials; this SNP (rs20455) is also associated with the risk for coronary heart disease (CHD) in multiple prospective studies. We investigated whether other common SNPs in the KIF6 region were associated with event reduction from statin therapy. Of the 170 SNPs in the KIF6 region investigated in the Cholesterol and Recurrent Events trial (CARE), 28 were associated with differential event reduction from statin therapy (Pinteraction < 0.1 in Caucasians, adjusted for age and sex) and were further investigated in the Pravastatin or Atorvastatin Evaluation and Infection Therapy-Thrombolysis In Myocardial Infarction 22 (PROVE IT-TIMI22) and West of Scotland Coronary Prevention Study (WOSCOPS). These analyses revealed that two SNPs (rs9462535 and rs9471077), in addition to rs20455, were associated with event reduction from statin therapy (Pinteraction < 0.1 in each of the three studies). The relative risk reduction ranged from 37 to 50% (P < 0.01) in carriers of the minor alleles of these SNPs and from −4 to 13% (P > 0.4) in non-carriers. These three SNPs are in high linkage disequilibrium with one another (r2 > 0.84). Functional studies of these variants may help to understand the role of KIF6 in the pathogenesis of CHD and differential response to statin therapy
Prevalence of transcription promoters within archaeal operons and coding sequences
Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ∼64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements
Motion for a Resolution tabled by Mr Richard Balfe for entry in the register pursuant to Rule 49 of the Rules of Procedure on supply of military equipment to states where basic human rights are not respected. Working Documents 1982-83, Document 1-265/82, 19 May 1982
Abstract Background The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called “germline leakage”. The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. Results The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. Conclusions The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software
Human Herpesvirus 6 (HHV-6) Causes Severe Thymocyte Depletion in SCID-hu Thy/Liv Mice
Human herpesvirus 6 (HHV-6) is a potentially immunosuppressive agent that may act as a cofactor in the progression of AIDS. Here, we describe the first small animal model of HHV-6 infection. HHV-6 subgroup A, strain GS, efficiently infected the human thymic tissue implanted in SCID-hu Thy/Liv mice, leading to the destruction of the graft. Viral DNA was detected in Thy/Liv implants by quantitative polymerase chain reaction (PCR) as early as 4 d after inoculation and peaked at day 14. The productive nature of the infection was confirmed by electron microscopy and immunohistochemical staining. Atypical thymocytes with prominent nuclear inclusions were detected by histopathology. HHV-6 replication was associated with severe, progressive thymocyte depletion involving all major cellular subsets. However, intrathymic T progenitor cells (ITTPs) appeared to be more severely depleted than the other subpopulations, and a preferred tropism of HHV-6 for ITTPs was demonstrated by quantitative PCR on purified thymocyte subsets. These findings suggest that thymocyte depletion by HHV-6 may be due to infection and destruction of these immature T cell precursors. Similar results were obtained with strain PL-1, a primary isolate belonging to subgroup B. The severity of the lesions observed in this animal model underscores the possibility that HHV-6 may indeed be immunosuppressive in humans
- …