309 research outputs found

    A cross-species transcriptomics approach to identify genes involved in leaf development

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We have made use of publicly available gene expression data to identify transcription factors and transcriptional modules (regulons) associated with leaf development in <it>Populus</it>. Different tissue types were compared to identify genes informative in the discrimination of leaf and non-leaf tissues. Transcriptional modules within this set of genes were identified in a much wider set of microarray data collected from leaves in a number of developmental, biotic, abiotic and transgenic experiments.</p> <p>Results</p> <p>Transcription factors that were over represented in leaf EST libraries and that were useful for discriminating leaves from other tissues were identified, revealing that the C2C2-YABBY, CCAAT-HAP3 and 5, MYB, and ZF-HD families are particularly important in leaves. The expression of transcriptional modules and transcription factors was examined across a number of experiments to select those that were particularly active during the early stages of leaf development. Two transcription factors were found to collocate to previously published Quantitative Trait Loci (QTL) for leaf length. We also found that miRNA family 396 may be important in the control of leaf development, with three members of the family collocating with clusters of leaf development QTL.</p> <p>Conclusion</p> <p>This work provides a set of candidate genes involved in the control and processes of leaf development. This resource can be used for a wide variety of purposes such as informing the selection of candidate genes for association mapping or for the selection of targets for reverse genetics studies to further understanding of the genetic control of leaf size and shape.</p

    Construction of a global map of human gene expression : the process, tools and analysis

    Get PDF
    This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.Kaikki monisoluisen organismin solut sisältävät saman geenivalikoiman. Solujen ulkonako ja toiminta määräytyvät sen mukaan, mitkä geeniyhdistelmät ovat aktiivisia. Solun geenien ilmentymistä voidaan mitata korkeasaantoisilla molekyylibiologian menetelmillä kuten DNA-siruilla. Tyypillisessä DNA-sirukokeessa mitataan geenien aktiivisuutta pienessä määrässä erilaisia solu- tai kudostyyppejä. Geenien ilmentymisen tutkiminen käyttäen suurempia näytemääriä ei usein ole mahdollista ja tieto aktiivisuuseroista organismitasolla on tuntematta. Tämä väitöskirja esittelee ihmisen geeniaktiviteetin tutkimukseen käytettävää karttaa sadoista solu- ja kudostyypeistä ja tarkastelee sen rakennetta. Tarkasteltava tieto on kerätty yli 200 erillisestä tutkimuksesta ja sisältää informaatiota geenien ilmentymisestä normaaleissa ja sairaissa solu- ja kudostyypeissä, jotka ovat peräisin yli 160 laboratoriosta. Kartta on luotu yhdistämällä tietoa kahdesta maailman suurimmasta DNA-sirutietokannasta (GEO ja ArrayExpress). Tämän kartan luominen auttoi osaltaan ArrayExpressin kehittämisessä parantamalla tiedon saatavuutta tutkijoille ja korjaamalla tiedossa olevia virheitä. Se oli myös mukana kehittämässä laskennallisia välineitä DNA-sirudatan manipulointiin ja GEOn ja ArrayExpressin välisen tiedon vaihdon luomisessa. Suurten tietomäärien käsittely ja analysointi on mahdollista vain, jos tieto on järjestetty systemaattisesti. Geenien ilmentymiskarttaan liitettyjen biologisten näytteiden kuvaukset systematisoitiin korvaamalla alkuperäiset näytekuvaukset muutamalla hyvin informatiivisella avainsanalla. Nämä avainsanat järjestettiin edelleen hierarkkisesti. Tätä hierarkiaa käytettiin sitten näytteiden automaattiseen ryhmittelyyn tiedon visualisoinnissa ja analysoinnissa. On tiedossa, että biologisen näytteen geenien ilmentymisessä havaittavat erot ovat suuremmat, jos mittaukset suoritetaan kahdessa eri laboratoriossa kuin jos mittaus toistetaan samassa laboratoriossa. Koska kattavan geenien ilmentymiskartan luomiseen käytetty tieto tuli monesta laboratoriosta, oli tärkeää varmistaa, että tämä niin sanottu laboratorioefekti ei vinouttasi analyysituloksia. Tästä syystä kaikki kartan luomiseen käytetty tieto tarkastettiin huolellisesti laadun ja vertailukelpoisuuden suhteen. Alkuperäinen kannuste kattavan ihmisen geenien ilmentymiskartan perustamiseen tuli kahden pahanlaatuisen ihosyöpänäytteen analysoinnista. Ihosyöpätutkimuksen tavoitteena oli tunnistaa geenejä, joiden aktiivisuus olisi kytköksissä pahanlaatuiseen solutyyppiin. Naiden geenien etsintä toi esille pienten solu- ja kudosmäärien käytön rajoitukset ja tarpeen geenien ilmentymisen kokonaisvaltaisempaan tutkimukseen

    Meeting Highlights: Plant, Animal and Microbe Genomes X

    Get PDF

    Analysis of large-scale molecular biological data using self-organizing maps

    Get PDF
    Modern high-throughput technologies such as microarrays, next generation sequencing and mass spectrometry provide huge amounts of data per measurement and challenge traditional analyses. New strategies of data processing, visualization and functional analysis are inevitable. This thesis presents an approach which applies a machine learning technique known as self organizing maps (SOMs). SOMs enable the parallel sample- and feature-centered view of molecular phenotypes combined with strong visualization and second-level analysis capabilities. We developed a comprehensive analysis and visualization pipeline based on SOMs. The unsupervised SOM mapping projects the initially high number of features, such as gene expression profiles, to meta-feature clusters of similar and hence potentially co-regulated single features. This reduction of dimension is attained by the re-weighting of primary information and does not entail a loss of primary information in contrast to simple filtering approaches. The meta-data provided by the SOM algorithm is visualized in terms of intuitive mosaic portraits. Sample-specific and common properties shared between samples emerge as a handful of localized spots in the portraits collecting groups of co-regulated and co-expressed meta-features. This characteristic color patterns reflect the data landscape of each sample and promote immediate identification of (meta-)features of interest. It will be demonstrated that SOM portraits transform large and heterogeneous sets of molecular biological data into an atlas of sample-specific texture maps which can be directly compared in terms of similarities and dissimilarities. Spot-clusters of correlated meta-features can be extracted from the SOM portraits in a subsequent step of aggregation. This spot-clustering effectively enables reduction of the dimensionality of the data in two subsequent steps towards a handful of signature modules in an unsupervised fashion. Furthermore we demonstrate that analysis techniques provide enhanced resolution if applied to the meta-features. The improved discrimination power of meta-features in downstream analyses such as hierarchical clustering, independent component analysis or pairwise correlation analysis is ascribed to essentially two facts: Firstly, the set of meta-features better represents the diversity of patterns and modes inherent in the data and secondly, it also possesses the better signal-to-noise characteristics as a comparable collection of single features. Additionally to the pattern-driven feature selection in the SOM portraits, we apply statistical measures to detect significantly differential features between sample classes. Implementation of scoring measurements supplements the basal SOM algorithm. Further, two variants of functional enrichment analyses are introduced which link sample specific patterns of the meta-feature landscape with biological knowledge and support functional interpretation of the data based on the ‘guilt by association’ principle. Finally, case studies selected from different ‘OMIC’ realms are presented in this thesis. In particular, molecular phenotype data derived from expression microarrays (mRNA, miRNA), sequencing (DNA methylation, histone modification patterns) or mass spectrometry (proteome), and also genotype data (SNP-microarrays) is analyzed. It is shown that the SOM analysis pipeline implies strong application capabilities and covers a broad range of potential purposes ranging from time series and treatment-vs.-control experiments to discrimination of samples according to genotypic, phenotypic or taxonomic classifications

    Towards a System Level Understanding of Non-Model Organisms Sampled from the Environment: A Network Biology Approach

    Get PDF
    The acquisition and analysis of datasets including multi-level omics and physiology from non-model species, sampled from field populations, is a formidable challenge, which so far has prevented the application of systems biology approaches. If successful, these could contribute enormously to improving our understanding of how populations of living organisms adapt to environmental stressors relating to, for example, pollution and climate. Here we describe the first application of a network inference approach integrating transcriptional, metabolic and phenotypic information representative of wild populations of the European flounder fish, sampled at seven estuarine locations in northern Europe with different degrees and profiles of chemical contaminants. We identified network modules, whose activity was predictive of environmental exposure and represented a link between molecular and morphometric indices. These sub-networks represented both known and candidate novel adverse outcome pathways representative of several aspects of human liver pathophysiology such as liver hyperplasia, fibrosis, and hepatocellular carcinoma. At the molecular level these pathways were linked to TNF alpha, TGF beta, PDGF, AGT and VEGF signalling. More generally, this pioneering study has important implications as it can be applied to model molecular mechanisms of compensatory adaptation to a wide range of scenarios in wild populations

    Data Integration for Regulatory Module Discovery

    No full text
    Genomic data relating to the functioning of individual genes and their products are rapidly being produced using many different and diverse experimental techniques. Each piece of data provides information on a specific aspect of the cell regulation process. Integration of these diverse types of data is essential in order to identify biologically relevant regulatory modules. In this thesis, we address this challenge by analyzing the nature of these datasets and propose new techniques of data integration. Since microarray data is not available in quantities that are required for valid inference, many researchers have taken the blind integrative approach where data from diverse microarray experiments are merged. In order to understand the validity of this approach, we start this thesis with studying the heterogeneity of microarray datasets. We have used KL divergence between individual dataset distributions as well as an empirical technique proposed by us to calculate functional similarity between the datasets. Our results indicate that we should not use a blind integration of datasets and much care should be taken to ensure that we mix only similar types of data. We should also be careful about the choice of normalization method. Next, we propose a semi-supervised spectral clustering method which integrates two diverse types of data for the task of gene regulatory module discovery. The technique uses constraints derived from DNA-binding, PPI and TF-gene interactions datasets to guide the clustering (spectral) of microarray experiments. Our results on yeast stress and cell-cycle microarray data indicate that the integration leads to more biologically significant results. Finally, we propose a technique that integrates datasets under the principle of maximum entropy. We argue that this is the most valid approach in an unsupervised setting where we have no other evidence regarding the weights to be assigned to individual datasets. Our experiments with yeast microarray, PPI, DNA-binding and TF-gene interactions datasets show improved biological significance of results

    Integrative Modeling of Transcriptional Regulation in Response to Autoimmune Desease Therapies

    Get PDF
    Die rheumatoide Arthritis (RA) und die Multiple Sklerose (MS) werden allgemein als Autoimmunkrankheiten eingestuft. Zur Behandlung dieser Krankheiten werden immunmodulatorische Medikamente eingesetzt, etwa TNF-alpha-Blocker (z.B. Etanercept) im Falle der RA und IFN-beta-Präparate (z.B. Betaferon und Avonex) im Falle der MS. Bis heute sind die molekularen Mechanismen dieser Therapien weitestgehend unbekannt. Zudem ist ihre Wirksamkeit und Verträglichkeit bei einigen Patienten unzureichend. In dieser Arbeit wurde die transkriptionelle Antwort im Blut von Patienten auf jede dieser drei Therapien untersucht, um die Wirkungsweise dieser Medikamente besser zu verstehen. Dabei wurden Methoden der Netzwerkinferenz eingesetzt, mit dem Ziel, die genregulatorischen Netzwerke (GRNs) der in ihrer Expression veränderten Gene zu rekonstruieren. Ausgangspunkt dieser Analysen war jeweils ein Genexpressions- Datensatz. Daraus wurden zunächst Gene gefiltert, die nach Therapiebeginn hoch- oder herunterreguliert sind. Anschließend wurden die genregulatorischen Regionen dieser Gene auf Transkriptionsfaktor-Bindestellen (TFBS) analysiert. Um schließlich GRN-Modelle abzuleiten, wurde ein neuer Netzwerkinferenz-Algorithmus (TILAR) verwendet. TILAR unterscheidet zwischen Genen und TF und beschreibt die regulatorischen Effekte zwischen diesen durch ein lineares Gleichungssystem. TILAR erlaubt dabei Vorwissen über Gen-TF- und TF-Gen-Interaktionen einzubeziehen. Im Ergebnis wurden komplexe Netzwerkstrukturen rekonstruiert, welche die regulatorischen Beziehungen zwischen den Genen beschreiben, die im Verlauf der Therapien differentiell exprimiert sind. Für die Etanercept-Therapie wurde ein Teilnetz gefunden, das Gene enthält, die niedrigere Expressionslevel bei RA-Patienten zeigen, die sehr gut auf das Medikament ansprechen. Die Analyse von GRNs kann somit zu einem besseren Verständnis Therapie-assoziierter Prozesse beitragen und transkriptionelle Unterschiede zwischen Patienten aufzeigen
    • …
    corecore