73 research outputs found

    Impact of the spotted microarray preprocessing method on fold-change compression and variance stability

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The standard approach for preprocessing spotted microarray data is to subtract the local background intensity from the spot foreground intensity, to perform a log2 transformation and to normalize the data with a global median or a lowess normalization. Although well motivated, standard approaches for background correction and for transformation have been widely criticized because they produce high variance at low intensities. Whereas various alternatives to the standard background correction methods and to log2 transformation were proposed, impacts of both successive preprocessing steps were not compared in an objective way.</p> <p>Results</p> <p>In this study, we assessed the impact of eight preprocessing methods combining four background correction methods and two transformations (the log2 and the glog), by using data from the MAQC study. The current results indicate that most preprocessing methods produce fold-change compression at low intensities. Fold-change compression was minimized using the Standard and the Edwards background correction methods coupled with a log2 transformation. The drawback of both methods is a high variance at low intensities which consequently produced poor estimations of the p-values. On the other hand, effective stabilization of the variance as well as better estimations of the p-values were observed after the glog transformation.</p> <p>Conclusion</p> <p>As both fold-change magnitudes and p-values are important in the context of microarray class comparison studies, we therefore recommend to combine the Edwards correction with a hybrid transformation method that uses the log2 transformation to estimate fold-change magnitudes and the glog transformation to estimate p-values.</p

    Statistics in Gene Expression, Metabolomics, and Comparative Genomics in Evolution

    Get PDF
    This thesis contains four papers concerning (I) the evolutionary conservation of drug targets and its potential use in environmental risk assessments, (II) RNA degradation as a control mechanism during osmotic stress in the yeast S. cerevisiae, (III) the localization and effects of the gene DDIT3 encoding a key regulator of stress response, and (IV) the integration and analysis of transcriptional and metabolic data to identify active metabolic pathways. </p> Environmental risk assessments are needed for the approval of new pharmaceutical compounds. To date, the risk assessments have mainly been focused on organisms like algae and Daphnia. The conservation of drug targets in species relevant for ecotoxicity testing is a key aspect in developing more targeted test strategies on higher organisms like fish or amphibians. With information on predicted proteomes for a wide range of species it is possible to extract data on evolutionary conservation for drug targets. In paper I, orthology data is compiled and analyzed for a set of human drug targets in several species, and the result evaluated based on an extensive literature search. </p> mRNA degradation can be investigated on a genome-wide scale with the use of a transcriptional inhibitor and subsequent hybridization of RNA pools, isolated at a set of time-points, to microarrays. Due to the complexity of the microarray methodology in this context, the data are in need of processing and transformation to deduce relevant information on changes in degradation rates. In paper II, mRNA degradation is investigated as a post-transcriptional control effect in connection to hyperosmotic stress. We conclude that mRNA degradation mechanisms are important regulatory keys in the stress response. </p> The gene DDIT3 encodes a protein acting as a regulator of the stress response within human cells. For example DNA damage, hypoxia, and starvation are stress types inducing DDIT3 transcription. DDIT3 is a transcription factor and has mainly been reported as a nuclear protein. In paper III, the effects and target genes of DDIT3 are investigated using techniques like microarrays, RT-qPCR, and various bioinformatical and statistical methods. We report that DDIT3 also can be localized to the cytoplasm, and induces or represses different genes compared to the nuclear form. The cytoplasmic form of DDIT3 is involved in migration, and inhibits the migratory effects of fibrosarcoma cells. </p> The development of different 'omics' technologies in molecular biology has resulted in several methods to characterize cells and tissues, for example microarrays to characterize the transcriptome (collection of gene transcripts) and spectrometry techniques like NMR to describe the metabolome (collection of small molecules). Interpretation of different 'omics' data is usually done separately, and often with respect to pathways, which are sets of reactions involving genes, metabolites, and proteins. A common research question is to deduce which pathways are active (regulated) when comparing two or several conditions. In paper IV, we propose a model to make such pathway level decisions by integrating transcriptomic and metabolomic data

    Burkitt lymphoma classification and MYC-associated non-Burkitt lymphoma investigation based on gene expression

    Get PDF
    Burkitt lymphoma and diffuse large B-cell lymphoma are two closely related types of lymphoma that are managed differently in clinical practice and the accurate diagnosis is a key point in treatment decisions. However based on current criteria combined with morphological, immunophenotypic and genetic characteristics, a significant number of cases exhibit overlapping features where diagnosis and treatment decisions are difficult to make. Especially, the prognosis have been reported significantly unfavourable in a subset of cases that are initially diagnosed as diffuse large B-cell lymphoma but bear MYC gene translocation, which is a defining feature of Burkitt lymphoma however can also be found in other lymphomas. Despite the adverse effect of MYC in aggressive lymphomas other than Burkitt lymphoma, the underlying mechanism and effective treatment is still unclear. Recent technological advances have made it possible to simultaneously investigate an enormous number of bio-molecules, and the scientific fields associated with measuring molecular data in such a high-throughput way are usually called “omics”. For example, genomics assesses thousands of DNA sequences and transcriptomics assays large numbers of transcripts in a single experiment. These techniques together with the rapidly emerging analytical methods in bioinformatics have introduced cancer research into a new era. The growing amount of omics data have significantly influenced the understanding of lymphomas and hold great promise in classifying subtypes, predicting treatment responses that will eventually lead to personalized therapy. Here in this study, we investigate the discrimination of Burkitt lymphoma and diffuse large B-cell lymphoma based on DNA microarray gene expression data, which has contributed most in molecular classification of lymphoma subtypes in the last decade. On the basis of two previous research level gene expression profiling classifiers, we developed a robust classifier that works effectively on different platforms and formalin fixed paraffin-embedded samples commonly used in routine clinic. The validation of the classifier on the samples from clinical patients achieves a high agreement with diagnosis made in a central haematopathology laboratory, and leads to a potential outcome indication in the patients presenting intermediate features. In addition, we explore the role of MYC in the above lymphomas. Our investigation emphasizes the inferior impact of high level MYC mRNA expression on patients’ outcome, and the functional analysis of MYC high expression associated genes show significantly enriched molecular mechanisms of proliferation and metabolic process. Moreover, the gene PRMT5 is found to be highly correlated with MYC expression which opens a possible therapeutic target for the treatment

    Functional characterization and annotation of trait-associated genomic regions by transcriptome analysis

    Get PDF
    In this work, two novel implementations have been presented, which could assist in the design and data analysis of high-throughput genomic experiments. An efficient and flexible tiling probe selection pipeline utilizing the penalized uniqueness score has been implemented, which could be employed in the design of various types and scales of genome tiling task. A novel hidden semi-Markov model (HSMM) implementation is made available within the Bioconductor project, which provides a unified interface for segmenting genomic data in a wide range of research subjects.In dieser Arbeit werden zwei neuartige Implementierungen präsentiert, die im Design und in der Datenanalyse von genomischen Hochdurchsatz-Experiment hilfreich sein könnten. Die erste Implementierung bildet eine effiziente und flexible Auswahl-Pipeline für Tiling-Proben, basierend auf einem Eindeutigkeitsmaß mit einer Maluswertung. Als zweite Implementierung wurde ein neuartiges Hidden-Semi-Markov-Modell (HSMM) im Bioconductor Projekt verfügbar gemacht

    Applications of MATLAB in Science and Engineering

    Get PDF
    The book consists of 24 chapters illustrating a wide range of areas where MATLAB tools are applied. These areas include mathematics, physics, chemistry and chemical engineering, mechanical engineering, biological (molecular biology) and medical sciences, communication and control systems, digital signal, image and video processing, system modeling and simulation. Many interesting problems have been included throughout the book, and its contents will be beneficial for students and professionals in wide areas of interest

    Advanced Visual Analytics Approaches for the Integrative Study of Genomic and Transcriptomic Data

    Get PDF
    The advances in next-generation sequencing (NGS) technology enabled rapid and cost-effective whole genome analyses. Nowadays, it is known that individual organisms have unique genome sequences and that differences between these sequences are the reason for genetic diversity. Furthermore, the biomolecular processes of living organisms are steered by genes and the interplay of their products. Perturbations in these systems often lead to disease. Thus, one of the major question in biomedical research is how genetic variations influence gene function, and how these affect underlying biological pathways and gene interaction networks. One of the most common sources of genetic diversity are single nucleotide variations (SNVs). So-called Genome Wide Association Studies (GWAS) as well as expression Quantitative Trait Locus (eQTL) studies intend to associate SNVs with e.g. disease related binary or quantitative traits. However, available methods are usually limited to statistical analyses and previous approaches to improve the interpretation of the respective results are often insufficient. The goal of this dissertation was the development of new visual analytical approaches to assist purely statistical methods in the identification, characterization and interpretation of SNVs. Genomic variations, especially SNVs, also play an important role in the immensely growing field of paleogenetics, where DNA of ancient origin is compared to modern DNA with the intention to gain insights into evolutionary history. In this dissertation, a computational pipeline for comparative NGS analyses of ancient and modern DNA samples has been described. Special attention was given to the read merging step, which is required to cope with the quality limitations inherent to ancient DNA (aDNA), in particular DNA fragmentation and nucleotide misincorporation. In addition, aDNA is usually only retrievable in low amounts and it is often contaminated with DNA of modern microorganisms. To solve this issue, a highly economical microarray-based DNA capturing strategy has been developed for the parallel detection and enrichment of aDNA from up to 100 different human pathogens

    Machine learning of genomic profiles

    Get PDF
    Gegenstand dieser Arbeit ist das maschinelle Lernen und seine Anwendung auf genomische Profile. Maschinelles Lernen ist ein Teilbereich der Informatik, der sich mit der Analyse und dem Design von Algorithmen beschaftigt, die Regeln und Muster aus Datensätzen ableiten. Genomische Profile beschreiben Veränderungen der DNA, z.B. der Anzahl ihrer Kopien. Tumorerkrankungen werden oftmals von diesen genomischen Veränderungen hervorgerufen. Es werden verschiedene Verfahren des maschinellen Lernens auf ihre Anwendbarkeit in Bezug auf genomische Profile untersucht. Des Weiteren wird eine Verlustfunktion für Überlebenszeitdaten entworfen. Anschließend wird ein analytischer Bezugsrahmen entwickelt, um Aberrationsmuster zu finden, die mit einer speziellen Tumorerkrankung assoziiert sind. Der Bezugsrahmen umfaßt die Vorverarbeitung, Merkmalsselektion und Diskretisierung von genomischen Profilen sowie Strategien zum Umgang mit fehlenden Werten und eine mehrdimensionale Analyse. Abschließend folgen das Training und die Analyse des Klassifikators. In dieser Arbeit wird weiterhin eine Erklärungskomponente vorgestellt, die wichtige Merkmale für die Klassifikation eines Falles identifiziert und ein Maß für die Richtigkeit einer Klassifikation liefert. Solch eine Erklärungskomponente kann die Basis für die Integration eines Klassifikators , z.B. einer Support-Vektor-Maschine, in ein entscheidungsunterstützendes System sein. Die im Rahmen dieser Arbeit entwickelten Methoden wurden erfolgreich zur Beantwortung von biologischen Fragestellungen wie der frühen Metastasierung oder der Mikrometastasierung angewandt und führten zur Entdeckung bisher unbekannter Tumormarker. Zusammenfassend zeigen die Ergebnisse der vorliegenden Arbeit, dass Verfahren des maschinellen Lernens zum Erkenntnisgewinn in Bezug auf genomische Veränderungen beitragen und Möglichkeiten zu einer weiteren Verbesserung der Therapie für Tumorpatienten aufzeigen

    Tracking Cancer Genetic Evolution using OncoTrack

    Get PDF
    abstract: It is difficult for existing methods to quantify, and track the constant evolution of cancers due to high heterogeneity of mutations. However, structural variations associated with nucleotide number changes show repeatable patterns in localized regions of the genome. Here we introduce SPKMG, which generalizes nucleotide number based properties of genes, in statistical terms, at the genome-wide scale. It is measured from the normalized amount of aligned NGS reads in exonic regions of a gene. SPKMG values are calculated within OncoTrack. SPKMG values being continuous numeric variables provide a statistical metric to track DNA level changes. We show that SPKMG measures of cancer DNA show a normative pattern at the genome-wide scale. The analysis leads to the discovery of core cancer genes and also provides novel dynamic insights into the stage of cancer, including cancer development, progression, and metastasis. This technique will allow exome data to also be used for quantitative LOH/CNV analysis for tracking tumour progression and evolution with a higher efficiency.The final version of this article, as published in Scientific Reports, can be viewed online at: https://www.nature.com/articles/srep2964

    MALDI-ToF mass spectrometry biomarker profiling via multivariate data analysis application in the biopharmaceutical bioprocessing industry

    Get PDF
    PhD ThesisMatrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-ToF MS) is a technique by which protein profiles can be rapidly produced from biological samples. Proteomic profiling and biomarker identification using MALDI-ToF MS have been utilised widely in microbiology for bacteria identification and in clinical proteomics for disease-related biomarker discovery. To date, the benefits of MALDI-ToF MS have not been realised in the area of mammalian cell culture during bioprocessing. This thesis explores the approach of ‘intact-cell’ MALDI-ToF MS (ICM-MS) combined with projection to latent structures – discriminant analysis (PLS-DA), to discriminate between mammalian cell lines during bioprocessing. Specifically, the industrial collaborator, Lonza Biologics is interested in adopting this approach to discriminate between IgG monoclonal antibody producing Chinese hamster ovaries (CHO) cell lines based on their productivities and identify protein biomarkers which are associated with the cell line productivities. After classifying cell lines into two categories (high/low producers; Hs/Ls), it is hypothesised that Hs and Ls CHO cells exhibit different metabolic profiles and hence differences in phenotypic expression patterns will be observed. The protein expression patterns correlate to the productivities of the cell lines, and introduce between-class variability. The chemometric method of PLS-DA can use this variability to classify the cell lines as Hs or Ls. A number of differentially expressed proteins were matched and identified as biomarkers after a SwissProt/TrEMBL protein database search. The identified proteins revealed that proteins involved in biological processes such as protein biosynthesis, protein folding, glycolysis and cytoskeleton architecture were upregulated in Hs. This study demonstrates that ICM-MS combined with PLS-DA and a protein database search can be a rapid and valuable tool for biomarker discovery in the bioprocessing industry. It may help in providing clues to potential cell genetic engineering targets as well as a tool in process development in the bioprocessing industry. With the completion of the sequencing of the CHO genome, this study provides a foundation for rapid biomarker profiling of CHO cell lines in culture during recombinant protein manufacturing.Lonza Biologics
    corecore