426 research outputs found
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
Nanopore sequencing of native adeno-associated virus (AAV) single-stranded DNA using a transposase-based rapid protocol
Radukic M, Brandt D, Haak M, Müller K, Kalinowski J. Nanopore sequencing of native adeno-associated virus (AAV) single-stranded DNA using a transposase-based rapid protocol. NAR Genomics and Bioinformatics. 2020;2(4): lqaa074.Next-generation sequencing of single-stranded DNA (ssDNA) enables transgene characterization of gene therapy vectors such as adeno-associated virus (AAV), but current library generation uses complicated and potentially biased second-strand synthesis. We report that libraries for nanopore sequencing of ssDNA can be conveniently created without second-strand synthesis using a transposase-based protocol. We show for bacteriophage M13 ssDNA that the MuA transposase has unexpected residual activity on ssDNA, explained in part by transposase action on transient double-stranded hairpins. In case of AAV, library creation is additionally aided by genome hybridization. We demonstrate the power of direct sequencing combined with nanopore long reads by characterizing AAV vector transgenes. Sequencing yielded reads up to full genome length, including GC-rich inverted terminal repeats. Unlike short-read techniques, single reads covered genome-genome and genome-contaminant fusions and other recombination events, whilst additionally providing information on epigenetic methylation. Single-nucleotide variants across the transgene cassette were revealed and secondary genome packaging signals were readily identified. Moreover, comparison of sequence abundance with quantitative polymerase chain reaction results demonstrated the technique's future potential for quantification of DNA impurities in AAV vector stocks. The findings promote direct nanopore sequencing as a fast and versatile platform for ssDNA characterization, such as AAV ssDNA in research and clinical settings
Fragment-based Pretraining and Finetuning on Molecular Graphs
Property prediction on molecular graphs is an important application of Graph
Neural Networks. Recently, unlabeled molecular data has become abundant, which
facilitates the rapid development of self-supervised learning for GNNs in the
chemical domain. In this work, we propose pretraining GNNs at the fragment
level, a promising middle ground to overcome the limitations of node-level and
graph-level pretraining. Borrowing techniques from recent work on principal
subgraph mining, we obtain a compact vocabulary of prevalent fragments from a
large pretraining dataset. From the extracted vocabulary, we introduce several
fragment-based contrastive and predictive pretraining tasks. The contrastive
learning task jointly pretrains two different GNNs: one on molecular graphs and
the other on fragment graphs, which represents higher-order connectivity within
molecules. By enforcing consistency between the fragment embedding and the
aggregated embedding of the corresponding atoms from the molecular graphs, we
ensure that the embeddings capture structural information at multiple
resolutions. The structural information of fragment graphs is further exploited
to extract auxiliary labels for graph-level predictive pretraining. We employ
both the pretrained molecular-based and fragment-based GNNs for downstream
prediction, thus utilizing the fragment information during finetuning. Our
graph fragment-based pretraining (GraphFP) advances the performances on 5 out
of 8 common molecular benchmarks and improves the performances on long-range
biological benchmarks by at least 11.5%. Code is available at:
https://github.com/lvkd84/GraphFP.Comment: 18 pages, 4 figures, published in NeurIPS 202
Recommended from our members
Characterization of Protein-Protein Interactions for Therapeutic Drug Design Utilizing Mass Spectrometry
The number of transferrin based therapeutics progressing to clinical trials remains disappointingly small despite promising capabilities of transporting therapeutic payloads to cancer cells and across the blood brain barrier. This meager success record is largely due to the complexity and heterogeneity of all protein conjugation products that generates difficulties for their analytical characterization. Discussed in this work, transferrin is conjugated to lysozyme as a model therapeutic to deliver this bacteriostatic protein to target central nervous system infections. In this work ESI- and MALDI-MS were used to characterize the modification sites at lysine residues in hopes of characterizing heterogeneity within the conjugate. Identification and quantization of modification sites using MS on tryptic digested samples proved difficult with poor signal to noise ratios and missing peptide fragments. The use of an 18O labeling method that exchanges both C-terminal oxygen atoms with 18O provided more reliable results, but still proved difficult to observe all needed peptide fragments. MALDI-MS allowed for verification of ESI-MS results, but was found unhelpful with full characterization due to abundant overlapping of isotopic labeled peaks. Hoping to create an ideal 1:1 binding ratio between the two proteins, a site-specific modification method using kinetically controlled conditions was used and was confirmed that the method, although capable of producing 1:1 conjugated species, actually created different isomers with separate binding frequencies at each lysine. Online-IEC helped with the identification of isomers and started the initial work of correlating modification sites with bioactivity of the proteins. It was determined that lysozyme has a high chance of being modified at lysine 33 and 116, with a possibility of also being highly modified at lysine 97. More work is needed to complete the characterization, especially with transferrin, but the experimental approaches developed in this work prove to be promising. This work aims at delivering an optimized framework for analytical characterization of protein and antibody conjugates to guide the development of future biopharmaceuticals
Molecular binding of formaldehyde to DNA and proteins
Formaldehyde is produced worldwide on a large scale (21 million tons in 2000) and used in a wide spectrum of applications. Its toxicity and carcinogenic effects have evoked numerous public health concerns. According to the International Agency on Research on Cancer (IARC), formaldehyde is classified as a known animal and human carcinogen, causing nasal cancer. More limited epidemiologic evidence suggests that formaldehyde can also induce leukemia in humans, however, this is controversial. In this dissertation, we have designed an integrated bottom-up approach to address critical issues to better understand formaldehyde's carcinogenic potential. Specifically, the N-terminus of histone and lysine residues located in both the histone N-terminal tail and the globular fold domain were identified as binding sites for formaldehyde in the current study. We also found that formaldehyde-induced lysine adducts could inhibit the formation of post translational modifications on histone, raising the possibility that formaldehyde might alter epigenetic regulation. We have also elucidated the structures of DNA-protein crosslinks induced by formaldehyde. Detailed characterization of the formaldehyde-derived linkage of single amino acids with nucleosides by NMR and mass spectrometry established that these amino acids all form cross-links involving formation of a formaldehyde-derived methylene bridge. Our results also demonstrated that Lys-dG cross-links are the most common DNA-protein crosslinks induced by formaldehyde, however, they are very labile. The finding that Cys-CH2-dG cross-links could be initiated by the S-hydroxymethyl group of cysteine residue lead to the identification of a novel dG-CH2-GSH adduct. This adduct is unique because of the involvement of S-hydroxymethylglutathione, a key player in the detoxification of formaldehyde. After our extensive work on biomarker discovery and validation involving DNA monoadducts and DNA-DNA cross-links, we applied these methods to analyze DNA samples from rats exposed to [13CD2]-formaldehyde for 1 day and 5 days. The results show that exogenous formaldehyde induced N2-hydroxymethyl-dG monoadducts and dG-dG cross-links in DNA from rat nasal mucosa, but did not form [13CD2]-adducts in distant tissues despite analyzing 5 times more DNA than for nasal epithelium. These data provide strong evidence supporting a genotoxic and cytotoxic mode of action for inhaled formaldehyde in the target tissue for carcinogenesis, but do not support the biological plausibility that inhaled formaldehyde causes leukemia in rats
Aspects of cyclodextrin host-guest complexes in mass spectrometry
Cancer is a widely spread disease leading to uncontrolled cellular replication that caused 9.6 million deaths worldwide in 2018. One approach in cancer treatment is inhibiting the replication process by the administration of organometallic compounds that bind to DNA. Cisplatin is one of the most prominent organometallic compounds that reached clinical approval. However, it suffers from severe side effects (e.g., nephrotoxicity) and causes the development of resistance. Various other metallorganic drugs have been evaluated for their potential in cancer treatment. Thereof, titanocene dichloride had entered clinical trials, but showed only low patient effcacy. Titanocene dichloride is a representative of the class of the bent metallocene dihalides that comprise a tetrahedral structure with two cyclopentadienyl and two halogenide ligands and a metal ion as central atom. Hydrolysis of the halogenide ligands is a crucial step in the activation of the metallocene, allowing for the interaction with its biological target. Unfortunately, extensive hydrolysis of the halogenide and the cyclopentadienyl ligands is detected for titanocene in aqueous environment at physiological conditions, leading to its inactivation. One approach for increasing the hydrolytic stability of titanocene is its inclusion within the cavity of a macrocyclic host structure. Cyclodextrins are such macrocyclic compounds composed of six to eight 1,4-linked α-D-glucopyranose units that are considered nontoxic upon oral administration. Therefore, several aspects of cyclodextrin host-guest complexes in mass spectrometry have been investigated and are discussed in this thesis.
In the first section, the mass spectrometric behavior of cyclodextrins is discussed. The central part of this project was the elucidation of the fragmentation mechanism underlying the
decomposition of protonated cyclodextrins. Linearization of the macrocyclic structure upon charge-induced cleavage of a glycosidic bond has been revealed as the initial dissociation step. Further decomposition of the linearized structure is characterized by neutral loss of glucose subunits. This dissociation step has been stated to occur upon charge-remote cleavage of other glycosidic bonds, leading to the elimination of a zwitterionic moiety which is potentially internally rearranged.
In the second section, the focus is laid on the interaction between titanocene and cyclodextrins elucidated from mass spectrometric experiments. The obtained data indicated the formation of covalent bonds between titanium and the hydroxy groups at the rim of cyclodextrins rather than the formation of an inclusion complex. Consequently, improvement of the hydrolytic stability of titanocene at physiological pH was not obtained by the interaction of titanocene with cyclodextrins.
In-source fragmentation has been found to contribute considerably to the ions detected in full scan mass spectrometry. Therefore, the effect of instrumental parameters on the quality of the obtained full scan mass spectra has been evaluated. While the capillary voltage showed only minor effects, proper adjustment of the capillary temperature and the tube lens voltage signifcantly improved the quality of the obtained data.
In conclusion, diverse aspects of cyclodextrin host-guest complexes have been successfully investigated using mass spectrometry showing the potential of this analytical technique for various applications
Synthetic, Biochemical, X-ray Crystallographic, Computational and High-Throughput Screening Approaches Toward Anthrax Toxin Lethal Factor Inhibition
University of Minnesota Ph.D. dissertation.October 2015. Major: Medicinal Chemistry. Advisor: Elizabeth Amin. 1 computer file (PDF); xvi, 227 pages.The lethal factor (LF) enzyme secreted by Bacillus anthracis is chiefly responsible for anthrax-related cytotoxicity. In this dissertation, I present the computational design, synthesis, biochemical testing, structural biology, and virtual and high-throughput screening approaches to identify binding requirements for LF inhibition. To this end, we designed ~50 novel compounds to probe design principles and structural requirements for LF. Specifically, in Chapters 2 and 3, computational, synthetic, biochemical and structural biology methods to explore the underinvestigated LF S2′ binding subsite are described. We discovered that LF domain 3 is very flexible and results in a relatively unconstrained S2′ binding site region. Additionally, we found that the S1′ subsite can undergo a novel conformational change resulting in a previously unreported tunnel region, which we term S1′*, that we expect can further be explored to design potent and selective LF inhibitors. Using this novel LF configuration, we virtually screened ~11 million drug-like compounds for activity against LF and have identified a novel compound that inhibits LF with an IC50 of 126 μM. In the course of this work, we found that reliable representation of zinc and other transition metal centers in macromolecules is nontrivial, due to the complexity of the coordination environment and charge distribution at the catalytic center. In Chapter 7, I will present work on applying and optimizing quantum mechanical methods developed by the Truhlar group to accurately calculate bond dissociation energies at low computational cost for various representative Zn2+ and Cd2+ model systems. By analyzing errors, we developed a prescription for an optimal system fragmentation strategy for our models. With this scheme, we find that the EE-3B-CE method is able to reproduce 53 conventionally calculated bond energies with an average absolute error of only 0.59 kcal/mol. Therefore, one could use the EE 3B CE approximation to obtain accurate results for large systems and/or identify better parameters for Zn centers for use in virtual screening. Finally, we present the results of a large-scale in vitro HTS campaign of ~250,000 small-molecules against LF. After extensive validation, involving secondary assays and hit synthesis we were able to prioritize a key lead for further prosecution
- …