1,449 research outputs found

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    Computational Methods for Omics Sequence Data with Focus on Non-Model Organisms

    Get PDF
    Sequence data are the backbone for many biological research areas including but not limited to genomics, proteomics as well as proteogenomics. Sequence acquisition is facilitated by a wide selection of advanced technologies such as Next Generation Sequencing and Mass Spectrometry. These high-throughput methods produce substantial volumes of data with decreasing financial and time-based expenditures. These volumes of data render manual processing impossible and therefore require state-of-the-art computational methods for adequate analysis and interpretation. In proteogenomics the potential of combining omics methods to improve on sequence quality and availability is frequently emphasized, in particular for non-model organisms. In this thesis, we highlight and address several challenges in the “life cycle” of omics sequence data, from genome sequence acquisition through integrated evaluation to extensive utilization of comprehensive sequence collections. We describe several methods with applications in different omics areas and emphasize means of potential integrative analysis. First, we introduce a method for \textit{de novo} assembly contig quality ranking based on machine learning. Thereby, we demonstrate special potential for the application on metagenomic sequence data which usually feature a variety of previously sequenced as well as unsequenced, non-model organisms. Next, we elaborate on sequence availability of target sequences in databases considered for taxonomic classification of tandem MS spectra. Thereby, the effect of different sequence sources as well as different search strategies on taxonomic depth is taken in account. Finally, we introduce a novel approach for extensive taxonomic classification by iteratively processing recent and comprehensive protein sequence databases. We discuss diverse possibilities as well as the limits of our methods with respect to current public data basis. Thereby, we illustrate potential benefits of the presented methods for non-model organisms.Sequenzdaten bilden das Rückrad für viele biologische Forschungsbereiche, einschließlich (aber nicht beschränkt auf) Genomik, Proteomik sowie Proteogenomik. Sequenzierung wird durch eine breite Auswahl an modernen Technologien ermöglicht, wie beispielsweise Next-Generation-Sequenzierung und Massenspektrometrie. Diese Hochdurchsatzverfahren erzeugen erhebliche Datenmengen mit immer geringerem zeitlichen und finanziellen Aufwand. Die anfallenden Datenvolumina lassen manuelle Aufbereitung nicht mehr zu und benötigen deshalb modernste rechnerische Methoden für eine adäquate Analyse und Interpretation. In der Proteogenomik wird das Potential die verschiedene Omik-Technologien zu kombinieren häufig betont, insbesondere für Non-Model-Organismen. In dieser Dissertation möchten wir einige Herausforderungen im „Lebenszyklus“ der Sequenzdaten hervorheben und uns eingehender mit ihnen befassen, von Genomsequenzierung über integrative Evaluierung zu extensiver Anwendung umfangreicher Sequenzdatenbanken. Wir beschreiben einige Methoden mit ihrer Anwendung in unterschiedlichen Omik-Gebieten und betrachten zusätzlich die Möglichkeiten einer potentiell integrativen Analyse. Zunächst stellen wir eine Methode für das Ranking von \textit{de novo} assemblierten Contigs basierend auf maschinellem Lernen vor. Dabei heben wir das besondere Potential für die Anwendung auf metagenomische Sequenzdaten hervor, welche für gewöhnlich ein große Vielfalt an zuvor sequenzierten als auch unsequenzierten Non-Model-Organismen aufweisen. Des Weiteren untersuchen wir den Einfluss von Sequenz-Verfügbarkeit in angewendeten Datenbanken in Bezug auf taxonomischem Klassifizierungspotential von Tandem-MS-Spektren. Dabei analysieren wir die Effekte verschiedener Sequenzquellen und Such-Strategien auf die taxonomische Tiefe. Abschließend stellen wir einen neuen Ansatz für eine extensive taxonomische Klassifizierung durch iterativer Aufarbeitung möglichst aktueller und umfangreicher Protein-Sequenz-Datenbanken. Wir diskutieren Potential und Grenzen unserer Methoden mit Hinblick auf aktuelle Sequenzdaten-Verfügbarkeit. Dabei zeigen wir potentiellen Nutzen für Non-Model Organismen auf

    Novel technologies enabling streamlined complete proteome analysis

    Get PDF

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    The Transcriptional Landscape of the Photosynthetic Model Cyanobacterium Synechocystis sp. PCC6803.

    Get PDF
    Cyanobacteria exhibit a great capacity to adapt to different environmental conditions through changes in gene expression. Although this plasticity has been extensively studied in the model cyanobacterium Synechocystis sp. PCC 6803, a detailed analysis of the coordinated transcriptional adaption across varying conditions is lacking. Here, we report a meta-analysis of 756 individual microarray measurements conducted in 37 independent studies-the most comprehensive study of the Synechocystis transcriptome to date. Using stringent statistical evaluation, we characterized the coordinated adaptation of Synechocystis' gene expression on systems level. Evaluation of the data revealed that the photosynthetic apparatus is subjected to greater changes in expression than other cellular components. Nevertheless, network analyses indicated a significant degree of transcriptional coordination of photosynthesis and various metabolic processes, and revealed the tight co-regulation of components of photosystems I, II and phycobilisomes. Detailed inspection of the integrated data led to the discovery a variety of regulatory patterns and novel putative photosynthetic genes. Intriguingly, global clustering analyses suggested contrasting transcriptional response of metabolic and regulatory genes stress to conditions. The integrated Synechocystis transcriptome can be accessed and interactively analyzed via the CyanoEXpress website (http://cyanoexpress.sysbiolab.eu)

    Context-based analysis of mass spectrometry proteomics data

    Get PDF

    Hypoxia induces a glycolytic complex in intestinal epithelial cells independent of HIF-1-driven glycolytic gene expression

    Get PDF
    The metabolic adaptation of eukaryotic cells to hypoxia involves increasing dependence upon glycolytic adenosine triphosphate (ATP) production, an event with consequences for cellular bioenergetics and cell fate. This response is regulated at the transcriptional level by the hypoxia-inducible factor-1(HIF-1)-dependent transcriptional upregulation of glycolytic enzymes (GEs) and glucose transporters. However, this transcriptional upregulation alone is unlikely to account fully for the levels of glycolytic ATP produced during hypoxia. Here, we investigated additional mechanisms regulating glycolysis in hypoxia. We observed that intestinal epithelial cells treated with inhibitors of transcription or translation and human platelets (which lack nuclei and the capacity for canonical transcriptional activity) maintained the capacity for hypoxia-induced glycolysis, a finding which suggests the involvement of a nontranscriptional component to the hypoxia-induced metabolic switch to a highly glycolytic phenotype. In our investigations into potential nontranscriptional mechanisms for glycolytic induction, we identified a hypoxia-sensitive formation of complexes comprising GEs and glucose transporters in intestinal epithelial cells. Surprisingly, the formation of such glycolytic complexes occurs independent of HIF-1-driven transcription. Finally, we provide evidence for the presence of HIF-1α in cytosolic fractions of hypoxic cells which physically interacts with the glucose transporter GLUT1 and the GEs in a hypoxia-sensitive manner. In conclusion, we provide insights into the nontranscriptional regulation of hypoxia-induced glycolysis in intestinal epithelial cells.</p

    Understanding and Engineering Metabolic Feedback Regulation of Amino Acid Metabolism in Escherichia coli

    Get PDF
    Metabolism is the core of what we consider to be a living cell. It covers all chemical reactions that are necessary to break down nutrients and convert them into energy and cellular building blocks for growth. These chemical reactions comprise a large metabolic network that is subject to tight feedback-regulation of enzyme activities or abundances. However, even in intensively studied model organisms like Escherichia coli, the knowledge about the function of feedback-regulatory mechanisms and how they interact to control metabolism is still sparse. Therefore, the first goal of this study was to understand the function and relevance of metabolic feedback regulation using amino acid metabolism in E. coli as a case study. The second goal was to use the knowledge about metabolic feedback regulation to engineer microbial cell factories for the production of amino acids like L-arginine. In Chapter 1 we constructed a panel of 7 mutants with allosterically dysregulated amino acid pathways to uncover the relevance and function of allosteric feedback inhibition in vivo, which was so far only demonstrated by theoretical studies. By combining metabolomics, proteomics and flux profiling we could show that allosteric feedback inhibition is crucial to adjust a reserve of biosynthetic enzymes. Such enzyme overabundance originates from a sensitive interaction between control of enzyme activity (allosteric feedback inhibition) and enzyme abundance (transcriptional regulation). Furthermore, we used a metabolic model and CRISPR interference experiments to show that enzyme overabundance renders cells more robust against genetic perturbations. In Chapter 2 we increased fitness of a rationally engineered arginine overproduction strain by leaving a certain level of transcriptional regulation. Therefore, we titrated the transcription factor ArgR by CRISPRi and compared this different level of transcriptional regulation with an ArgR knockout strain. Using the CRISPRi approach we elevated the growth rates of an overproduction strain by two-fold compared to the knockout strain, without impairing arginine production rates and titer. Metabolomics and proteomics experiments revealed that slow growth of the knockout strain derives from limitations in pyrimidine nucleotide metabolism and that these limitations are caused by imbalances of enzyme level at critical branching points. Thus, we demonstrated the importance of balancing enzymes in an overproduction pathway and that CRISPRi is a suitable tool for this purpose. In Chapter 3 we show how cells respond to genetic perturbation on the molecular scale. Therefore, we perturbed amino acid biosynthesis genes with CRISPRi and analyzed the transcriptional response with GFP-reporter plasmids and proteomics. These experiments revealed that cells elevate the expression of genes in a perturbed pathway to counteract a genetic perturbation (We will refer to this mechanism as transcriptional compensation). Metabolomics and flow cytometry data of the wild-type and the allosteric mutant demonstrated the benefit of enzyme overabundance in response to genetic perturbations: Cells without overabundance showed a heterogenic transcriptional compensation even to mild perturbations, whereas in wild-type cells such mild perturbations were buffered by enzyme overabundance. In Chapter 4 we consider amino acid degradation pathways as an additional regulatory mechanism for the maintenance of end-product homeostasis Nutritional downshift experiments revealed increased robustness of allosteric mutants in which the respective degradation pathway was up-regulated. By dynamic metabolite measurements we showed that E. coli channels an excess of arginine into the degradation pathway. This overflow mechanism might be the reason for the robustness of allosteric mutants under dynamic conditions

    Integrated Physiological, Proteomic, and Metabolomic Analysis of Ultra Violet (UV) Stress Responses and Adaptation Mechanisms in Pinus radiata

    Get PDF
    Globally expected changes in environmental conditions, especially the increase of UV irradiation, necessitate extending our knowledge of the mechanisms mediating tree species adaptation to this stress. This is crucial for designing new strategies to maintain future forest productivity. Studies focused on environmentally realistic dosages of UV irradiation in forest species are scarce. Pinus spp. are commercially relevant trees and not much is known about their adaptation to UV. In this work, UV treatment and recovery of Pinus radiata plants with dosages mimicking future scenarios, based on current models of UV radiation, were performed in a time-dependent manner. The combined metabolome and proteome analysis were complemented with measurements of + physiological parameters and gene expression. Sparse PLS analysis revealed complex molecular interaction networks of molecular and physiological data. Early responses prevented phototoxicity by reducing photosystem activity and the electron transfer chain together with the accumulation of photoprotectors and photorespiration. Apart from the reduction in photosynthesis as consequence of the direct UV damage on the photosystems, the primary metabolism was rearranged to deal with the oxidative stress while minimizing ROS production. New protein kinases and proteases related to signaling, coordination, and regulation of UV stress responses were revealed. All these processes demonstrate a complex molecular interaction network extending the current knowledge on UV-stress adaptation in pine
    corecore