586 research outputs found

    The AFLOW Fleet for Materials Discovery

    Full text link
    The traditional paradigm for materials discovery has been recently expanded to incorporate substantial data driven research. With the intent to accelerate the development and the deployment of new technologies, the AFLOW Fleet for computational materials design automates high-throughput first principles calculations, and provides tools for data verification and dissemination for a broad community of users. AFLOW incorporates different computational modules to robustly determine thermodynamic stability, electronic band structures, vibrational dispersions, thermo-mechanical properties and more. The AFLOW data repository is publicly accessible online at aflow.org, with more than 1.7 million materials entries and a panoply of queryable computed properties. Tools to programmatically search and process the data, as well as to perform online machine learning predictions, are also available.Comment: 14 pages, 8 figure

    Ultra Performance Liquid Chromatography and High Resolution Mass Spectrometry for the Analysis of Plant Lipids

    Get PDF
    Holistic analysis of lipids is becoming increasingly popular in the life sciences. Recently, several interesting, mass spectrometry-based studies have been conducted, especially in plant biology. However, while great advancements have been made we are still far from detecting all the lipids species in an organism. In this study we developed an ultra performance liquid chromatography-based method using a high resolution, accurate mass, mass spectrometer for the comprehensive profiling of more than 260 polar and non-polar Arabidopsis thaliana leaf lipids. The method is fully compatible to the commonly used lipid extraction protocols and provides a viable alternative to the commonly used direct infusion-based shotgun lipidomics approaches. The whole process is described in detail and compared to alternative lipidomic approaches. Next to the developed method we also introduce an in-house developed database search software (GoBioSpace), which allows one to perform targeted or un-targeted lipidomic and metabolomic analysis on mass spectrometric data of every kind

    Methods in automated glycosaminoglycan tandem mass spectra analysis

    Get PDF
    Glycosylation is the process by which a glycan is enzymatically attached to a protein, and is one of the most common post-translational modifications in nature. One class of glycans is the glycosaminoglycans (GAGs), which are long, linear polysaccharides that are variably sulfated and make up the glycan portion of proteoglycans (PGs). PGs are located on the cellular surface and in the extracellular matrix (ECM), making them important molecules for cell signaling and ligand binding. The GAG sulfation sequence is a determining factor for the signaling capacity of binding complexes, so accurate determination of the sequence is critical. Historically, GAG sequencing using tandem mass spectrometry (MS2) has been a difficult, manual process; however, with the advent of faster computational techniques and higher-resolution MS2, high-throughput GAG sequencing is within reach. Two steps in the pipeline of biomolecule sequencing using MS2 are discovery and interpretation of spectral peaks. The discovery step traditionally is performed using methods that rely on the concept of averagine, or the average molecular building block for the analyte in question. These methods were developed for protein sequencing, but perform considerably worse on GAG sequences, due to the non-uniform distribution of sulfur atoms along the chain and the relatively high isotope abundance of 34S. The interpretation step traditionally is performed manually, which takes time and introduces potential user error. To combat these problems, I developed GAGfinder, the first GAG-specific MS2 peak finding and annotation software. GAGfinder is described in detail in chapter two. Another step in MS2 sequencing is the determination of the sequence using the found MS2 fragments. For a given GAG composition, there are many possible sequences, and peak finding algorithms such as GAGfinder return a list of the peaks in the MS2 mass spectrum. The many-to-many relationship between sequences and fragments can be represented using a bipartite network, and node-ranking techniques can be employed to generate likelihood scores for possible sequences. I developed a bipartite network-based sequencing tool, GAGrank, based on a bipartite network extension of Google’s PageRank algorithm for ranking websites. GAGrank is described in detail in chapter three

    The discovery of new functional oxides using combinatorial techniques and advanced data mining algorithms

    Get PDF
    Electroceramic materials research is a wide ranging field driven by device applications. For many years, the demand for new materials was addressed largely through serial processing and analysis of samples often similar in composition to those already characterised. The Functional Oxide Discovery project (FOXD) is a combinatorial materials discovery project combining high-throughput synthesis and characterisation with advanced data mining to develop novel materials. Dielectric ceramics are of interest for use in telecommunications equipment; oxygen ion conductors are examined for use in fuel cell cathodes. Both applications are subject to ever increasing industry demands and materials designs capable of meeting the stringent requirements are urgently required. The London University Search Instrument (LUSI) is a combinatorial robot employed for materials synthesis. Ceramic samples are produced automatically using an ink-jet printer which mixes and prints inks onto alumina slides. The slides are transferred to a furnace for sintering and transported to other locations for analysis. Production and analysis data are stored in the project database. The database forms a valuable resource detailing the progress of the project and forming a basis for data mining. Materials design is a two stage process. The first stage, forward prediction, is accomplished using an artificial neural network, a Baconian, inductive technique. In a second stage, the artificial neural network is inverted using a genetic algorithm. The artificial neural network prediction, stoichiometry and prediction reliability form objectives for the genetic algorithm which results in a selection of materials designs. The full potential of this approach is realised through the manufacture and characterisation of the materials. The resulting data improves the prediction algorithms, permitting iterative improvement to the designs and the discovery of completely new materials

    Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'

    Get PDF
    BACKGROUND: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI). RESULTS: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data. CONCLUSION: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data

    Rapid materials screening for renewable energy using high-throughput density functional theory

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Materials Science and Engineering, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 204-211).This thesis relates to the emerging field of high-throughput density functional theory (DFT) computation for materials design and optimization. Although highthroughput DFT is a promising new method for materials discovery, its practical implementation can be difficult. This thesis describes in detail a software infrastructure used to perform over 80,000 DFT computations. Accurately calculating total energies of diverse chemistries is an ongoing effort in the electronic structure community. We describe a method of mixing total energy calculations from different energy functionals (e.g., GGA and GGA+U) so that highthroughput calculations can be more accurately applied over a wide chemical space. Having described methods to perform accurate and rapid DFT calculations, we move next to applications. A first application relates to finding sorbents for Hg gas removal for Integrated Gas Combined Cycle (IGCC) power plants. We demonstrate that rapid computations of amalgamation and oxidation energies can identify the most promising metal sorbents from a candidate list. In the future, more extensive candidate lists might be tested. A second application relates to the design and understanding of Li ion battery cathodes. We compute some properties of about 15,000 virtual cathode materials to identify a new cathode chemistry, Li₉V₃(P₂O₇)₃(PO₄)₂ . This mixed diphosphate-phosphate material was recently synthesized by both our research group and by an outside group. We perform an in-depth computational study of Li₉V₃(P₂O₇)₃(PO₄)₂ and suggest Mo doping as an avenue for its improvement. A major concern for Li ion battery cathodes is safety with respect to 02 release. By examining our large data set of computations on cathode materials, we show that i) safety roughly decreases with increasing voltage and ii) for a given redox couple, polyanion groups reduce safety. These results suggest important limitations for researchers designing high-voltage cathodes. Finally, this thesis describes the beginnings of a highly collaborative 'Materials Genome' web resource to share our calculated results with the general materials community. Through the Materials Genome, we expect that the work presented in this thesis will not only contribute to the applications discussed herein, but help make high-throughput computations accessible to the broader materials community.by Anubhav Jain.Ph.D

    High-throughput data mined prediction of inorganic compounds and computational discovery of new lithium-ion battery cathode materials

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Materials Science and Engineering, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from PDF version of thesis.Includes bibliographical references (p. 117-129).The ability to computationally predict the properties of new materials, even prior to their synthesis, has been made possible due to the current accuracy of modern ab initio techniques. In some cases, high-throughput computations can be used to create large data sets of potential compounds and their computed properties. However, regardless of the field of application, such a computational high-throughput approach faces a major problem: to be relevant, the properties need to be computed on compounds (i.e., stoichiometries and crystal structures) that will be stable enough to be synthesized. In this thesis, we address this compound prediction problem through a combination of data mining and high-throughput Density Functional Theory. We first describe a method based on correlations between crystal structure prototypes that can be used with a limited computational budget to search for new ternary oxides. In addition, for the treatment of sparser data regions such as quaternaries, a new algorithm based on the data mining of ionic substitutions is proposed and analyzed. The second part of this thesis demonstrates the application of this highthroughput ab initio computing technique to the lithium-ion battery field. Here, we describe a large-scale computational search for novel cathode materials with specific battery properties, which enables experimentalists to focus on only the most promising chemistries. Finally, to illustrate the potential of new compound computational discovery using this approach, a novel chemical class of cathode materials, the carbonophosphates, is presented along with synthesis and electrochemical results.by Geoffroy Hautier.Ph.D

    An integrated computational and experimental study to investigate \u3ci\u3eStaphylococcus aureus\u3c/i\u3e metabolism

    Get PDF
    Staphylococcus aureus is a metabolically versatile pathogen that colonizes nearly all organs of the human body. A detailed and comprehensive knowledge of staphylococcal metabolism is essential to understand its pathogenesis. To this end, we have reconstructed and experimentally validated an updated and enhanced genome-scale metabolic model of S. aureus USA300_FPR3757. The model combined genome annotation data, reaction stoichiometry, and regulation information from biochemical databases and previous strain-specific models. Reactions in the model were checked and fixed to ensure chemical balance and thermodynamic consistency. To further refine the model, growth assessment of 1920 nonessential mutants from the Nebraska Transposon Mutant Library was performed, and metabolite excretion profiles of important mutants in carbon and nitrogen metabolism were determined. The growth and no-growth inconsistencies between the model predictions and in vivo essentiality data were resolved using extensive manual curation based on optimization-based reconciliation algorithms. Upon intensive curation and refinements, the model contains 863 metabolic genes, 1379 metabolites (including 1159 unique metabolites), and 1545 reactions including transport and exchange reactions. To improve the accuracy and predictability of the model to environmental changes, condition-specific regulation information curated from the existing knowledgebase was incorporated. These critical additions improved the model performance significantly in capturing gene essentiality, substrate utilization, and metabolite production capabilities and increased the ability to generate model-based discoveries of therapeutic significance. Use of this highly curated model will enhance the functional utility of omics data, and therefore, serve as a resource to support future investigations of S. aureus and to augment staphylococcal research worldwide

    Integrating glycomics, proteomics and glycoproteomics to understand the structural basis for influenza a virus evolution and glycan mediated immune interactions

    Get PDF
    Glycosylation modulates the range and specificity of interactions among glycoproteins and their binding partners. This is important in influenza A virus (IAV) biology because binding of host immune molecules depends on glycosylation of viral surface proteins such as hemagglutinin (HA). Circulating viruses mutate rapidly in response to pressure from the host immune system. As proteins mutate, the virus glycosylation patterns change. The consequence is that viruses evolve to evade host immune responses, which renders vaccines ineffective. Glycan biosynthesis is a non-template driven process, governed by stoichiometric and steric relationships between the enzymatic machinery for glycosylation and the protein being glycosylated. Consequently, protein glycosylation is heterogeneous, thereby making structural analysis and elucidation of precise biological functions extremely challenging. The lack of structural information has been a limiting factor in understanding the exact mechanisms of glycan-mediated interactions of the IAV with host immune-lectins. Genetic sequencing methods allow prediction of glycosylation sites along the protein backbone but are unable to provide exact phenotypic information regarding site occupancy. Crystallography methods are also unable to determine the glycan structures beyond the core residues due to the flexible nature of carbohydrates. This dissertation centers on the development of chromatography and mass spectrometry methods for characterization of site-specific glycosylation in complex glycoproteins and application of these methods to IAV glycomics and glycoproteomics. We combined the site-specific glycosylation information generated using mass spectrometry with information from biochemical assays and structural modeling studies to identify key glycosylation sites mediating interactions of HA with immune lectin surfactant protein-D (SP-D). We also identified the structural features that control glycan processing at these sites, particularly those involving glycan maturation from high-mannose to complex-type, which, in turn, regulate interactions with SP-D. The work presented in this dissertation contributes significantly to the improvement of analytical and bioinformatics methods in glycan and glycoprotein analysis using mass spectrometry and greatly advances the understanding of the structural features regulating glycan microheterogeneity on HA and its interactions with host immune lectins
    corecore