19,937 research outputs found

    Toward a multilevel representation of protein molecules: comparative approaches to the aggregation/folding propensity problem

    Full text link
    This paper builds upon the fundamental work of Niwa et al. [34], which provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free standardized microenvironment. The hardness of the problem comes from the superposition between the driving forces of intra- and inter-molecule interactions and it is mirrored by the evidences of shift from folding to aggregation phenotypes by single-point mutations [10]. Here we apply several state-of-the-art classification methods coming from the field of structural pattern recognition, with the aim to compare different representations of the same proteins gathered from the Niwa et al. data base; such representations include sequences and labeled (contact) graphs enriched with chemico-physical attributes. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating "easily foldable" from "hardly foldable" molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution used in the various discrimination systems.Comment: 17 pages, 3 figures, 46 reference

    CBR and MBR techniques: review for an application in the emergencies domain

    Get PDF
    The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system. RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to: a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location. In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations. This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version

    Impact of Embedded Carbon Fiber Heating Panel on the Structural/Mechanical Performance of Roadway Pavement

    Get PDF
    INE/AUTC 12.3

    Computational Tools for the Untargeted Assignment of FT-MS Metabolomics Datasets

    Get PDF
    Metabolomics is the study of metabolomes, the sets of metabolites observed in living systems. Metabolism interconverts these metabolites to provide the molecules and energy necessary for life processes. Many disease processes, including cancer, have a significant metabolic component that manifests as differences in what metabolites are present and in what quantities they are produced and utilized. Thus, using metabolomics, differences between metabolomes in disease and non-disease states can be detected and these differences improve our understanding of disease processes at the molecular level. Despite the potential benefits of metabolomics, the comprehensive investigation of metabolomes remains difficult. A popular analytical technique for metabolomics is mass spectrometry. Advances in Fourier transform mass spectrometry (FT-MS) instrumentation have yielded simultaneous improvements in mass resolution, mass accuracy, and detection sensitivity. In the metabolomics field, these advantages permit more complicated, but more informative experimental designs such as the use of multiple isotope-labeled precursors in stable isotope-resolved metabolomics (SIRM) experiments. However, despite these potential applications, several outstanding problems hamper the use of FT-MS for metabolomics studies. First, artifacts and data quality problems in FT-MS spectra can confound downstream data analyses, confuse machine learning models, and complicate the robust detection and assignment of metabolite features. Second, the assignment of observed spectral features to metabolites remains difficult. Existing targeted approaches for assignment often employ databases of known metabolites; however, metabolite databases are incomplete, thus limiting or biasing assignment results. Additionally, FT-MS provides limited structural information for observed metabolites, which complicates the determination of metabolite class (e.g. lipid, sugar, etc. ) for observed metabolite spectral features, a necessary step for many metabolomics experiments. To address these problems, a set of tools were developed. The first tool identifies artifacts with high peak density observed in many FT-MS spectra and removes them safely. Using this tool, two previously unreported types of high peak density artifact were identified in FT-MS spectra: fuzzy sites and partial ringing. Fuzzy sites were particularly problematic as they confused and reduced the accuracy of machine learning models trained on datasets containing these artifacts. Second, a tool called SMIRFE was developed to assign isotope-resolved molecular formulas to observed spectral features in an untargeted manner without a database of expected metabolites. This new untargeted method was validated on a gold-standard dataset containing both unlabeled and 15N-labeled compounds and was able to identify 18 of 18 expected spectral features. Third, a collection of machine learning models was constructed to predict if a molecular formula corresponds to one or more lipid categories. These models accurately predict the correct one of eight lipid categories on our training dataset of known lipid and non-lipid molecular formulas with precisions and accuracies over 90% for most categories. These models were used to predict lipid categories for untargeted SMIRFE-derived assignments in a non-small cell lung cancer dataset. Subsequent differential abundance analysis revealed a sub-population of non-small cell lung cancer samples with a significantly increased abundance in sterol lipids. This finding implies a possible therapeutic role of statins in the treatment and/or prevention of non-small cell lung cancer. Collectively these tools represent a pipeline for FT-MS metabolomics datasets that is compatible with isotope labeling experiments. With these tools, more robust and untargeted metabolic analyses of disease will be possible

    Increasing pattern recognition accuracy for chemical sensing by evolutionary based drift compensation

    Get PDF
    Artificial olfaction systems, which mimic human olfaction by using arrays of gas chemical sensors combined with pattern recognition methods, represent a potentially low-cost tool in many areas of industry such as perfumery, food and drink production, clinical diagnosis, health and safety, environmental monitoring and process control. However, successful applications of these systems are still largely limited to specialized laboratories. Sensor drift, i.e., the lack of a sensor's stability over time, still limits real in dustrial setups. This paper presents and discusses an evolutionary based adaptive drift-correction method designed to work with state-of-the-art classification systems. The proposed approach exploits a cutting-edge evolutionary strategy to iteratively tweak the coefficients of a linear transformation which can transparently correct raw sensors' measures thus mitigating the negative effects of the drift. The method learns the optimal correction strategy without the use of models or other hypotheses on the behavior of the physical chemical sensors

    Multi-tier framework for the inferential measurement and data-driven modeling

    Get PDF
    A framework for the inferential measurement and data-driven modeling has been proposed and assessed in several real-world application domains. The architecture of the framework has been structured in multiple tiers to facilitate extensibility and the integration of new components. Each of the proposed four tiers has been assessed in an uncoupled way to verify their suitability. The first tier, dealing with exploratory data analysis, has been assessed with the characterization of the chemical space related to the biodegradation of organic chemicals. This analysis has established relationships between physicochemical variables and biodegradation rates that have been used for model development. At the preprocessing level, a novel method for feature selection based on dissimilarity measures between Self-Organizing maps (SOM) has been developed and assessed. The proposed method selected more features than others published in literature but leads to models with improved predictive power. Single and multiple data imputation techniques based on the SOM have also been used to recover missing data in a Waste Water Treatment Plant benchmark. A new dynamic method to adjust the centers and widths of in Radial basis Function networks has been proposed to predict water quality. The proposed method outperformed other neural networks. The proposed modeling components have also been assessed in the development of prediction and classification models for biodegradation rates in different media. The results obtained proved the suitability of this approach to develop data-driven models when the complex dynamics of the process prevents the formulation of mechanistic models. The use of rule generation algorithms and Bayesian dependency models has been preliminary screened to provide the framework with interpretation capabilities. Preliminary results obtained from the classification of Modes of Toxic Action (MOA) indicate that this could be a promising approach to use MOAs as proxy indicators of human health effects of chemicals.Finally, the complete framework has been applied to three different modeling scenarios. A virtual sensor system, capable of inferring product quality indices from primary process variables has been developed and assessed. The system was integrated with the control system in a real chemical plant outperforming multi-linear correlation models usually adopted by chemical manufacturers. A model to predict carcinogenicity from molecular structure for a set of aromatic compounds has been developed and tested. Results obtained after the application of the SOM-dissimilarity feature selection method yielded better results than models published in the literature. Finally, the framework has been used to facilitate a new approach for environmental modeling and risk management within geographical information systems (GIS). The SOM has been successfully used to characterize exposure scenarios and to provide estimations of missing data through geographic interpolation. The combination of SOM and Gaussian Mixture models facilitated the formulation of a new probabilistic risk assessment approach.Aquesta tesi proposa i avalua en diverses aplicacions reals, un marc general de treball per al desenvolupament de sistemes de mesurament inferencial i de modelat basats en dades. L'arquitectura d'aquest marc de treball s'organitza en diverses capes que faciliten la seva extensibilitat així com la integració de nous components. Cadascun dels quatre nivells en que s'estructura la proposta de marc de treball ha estat avaluat de forma independent per a verificar la seva funcionalitat. El primer que nivell s'ocupa de l'anàlisi exploratòria de dades ha esta avaluat a partir de la caracterització de l'espai químic corresponent a la biodegradació de certs compostos orgànics. Fruit d'aquest anàlisi s'han establert relacions entre diverses variables físico-químiques que han estat emprades posteriorment per al desenvolupament de models de biodegradació. A nivell del preprocés de les dades s'ha desenvolupat i avaluat una nova metodologia per a la selecció de variables basada en l'ús del Mapes Autoorganitzats (SOM). Tot i que el mètode proposat selecciona, en general, un major nombre de variables que altres mètodes proposats a la literatura, els models resultants mostren una millor capacitat predictiva. S'han avaluat també tot un conjunt de tècniques d'imputació de dades basades en el SOM amb un conjunt de dades estàndard corresponent als paràmetres d'operació d'una planta de tractament d'aigües residuals. Es proposa i avalua en un problema de predicció de qualitat en aigua un nou model dinàmic per a ajustar el centre i la dispersió en xarxes de funcions de base radial. El mètode proposat millora els resultats obtinguts amb altres arquitectures neuronals. Els components de modelat proposat s'han aplicat també al desenvolupament de models predictius i de classificació de les velocitats de biodegradació de compostos orgànics en diferents medis. Els resultats obtinguts demostren la viabilitat d'aquesta aproximació per a desenvolupar models basats en dades en aquells casos en els que la complexitat de dinàmica del procés impedeix formular models mecanicistes. S'ha dut a terme un estudi preliminar de l'ús de algorismes de generació de regles i de grafs de dependència bayesiana per a introduir una nova capa que faciliti la interpretació dels models. Els resultats preliminars obtinguts a partir de la classificació dels Modes d'acció Tòxica (MOA) apunten a que l'ús dels MOA com a indicadors intermediaris dels efectes dels compostos químics en la salut és una aproximació factible.Finalment, el marc de treball proposat s'ha aplicat en tres escenaris de modelat diferents. En primer lloc, s'ha desenvolupat i avaluat un sensor virtual capaç d'inferir índexs de qualitat a partir de variables primàries de procés. El sensor resultant ha estat implementat en una planta química real millorant els resultats de les correlacions multilineals emprades habitualment. S'ha desenvolupat i avaluat un model per a predir els efectes carcinògens d'un grup de compostos aromàtics a partir de la seva estructura molecular. Els resultats obtinguts desprès d'aplicar el mètode de selecció de variables basat en el SOM milloren els resultats prèviament publicats. Aquest marc de treball s'ha usat també per a proporcionar una nova aproximació al modelat ambiental i l'anàlisi de risc amb sistemes d'informació geogràfica (GIS). S'ha usat el SOM per a caracteritzar escenaris d'exposició i per a desenvolupar un nou mètode d'interpolació geogràfica. La combinació del SOM amb els models de mescla de gaussianes dona una nova formulació al problema de l'anàlisi de risc des d'un punt de vista probabilístic
    corecore