84 research outputs found

    Trustworthiness and metrics in visualizing similarity of gene expression

    Get PDF
    BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it

    Transcriptome of Saccharomyces cerevisiae during production of D-xylonate

    Full text link
    BACKGROUND: Production of D-xylonate by the yeast S. cerevisiae provides an example of bioprocess development for sustainable production of value-added chemicals from cheap raw materials or side streams. Production of D-xylonate may lead to considerable intracellular accumulation of D-xylonate and to loss of viability during the production process. In order to understand the physiological responses associated with D-xylonate production, we performed transcriptome analyses during D-xylonate production by a robust recombinant strain of S. cerevisiae which produces up to 50 g/L D-xylonate. RESULTS: Comparison of the transcriptomes of the D-xylonate producing and the control strain showed considerably higher expression of the genes controlled by the cell wall integrity (CWI) pathway and of some genes previously identified as up-regulated in response to other organic acids in the D-xylonate producing strain. Increased phosphorylation of Slt2 kinase in the D-xylonate producing strain also indicated that D-xylonate production caused stress to the cell wall. Surprisingly, genes encoding proteins involved in translation, ribosome structure and RNA metabolism, processes which are commonly down-regulated under conditions causing cellular stress, were up-regulated during D-xylonate production, compared to the control. The overall transcriptional responses were, therefore, very dissimilar to those previously reported as being associated with stress, including stress induced by organic acid treatment or production. Quantitative PCR analyses of selected genes supported the observations made in the transcriptomic analysis. In addition, consumption of ethanol was slower and the level of trehalose was lower in the D-xylonate producing strain, compared to the control. CONCLUSIONS: The production of organic acids has a major impact on the physiology of yeast cells, but the transcriptional responses to presence or production of different acids differs considerably, being much more diverse than responses to other stresses. D-Xylonate production apparently imposed considerable stress on the cell wall. Transcriptional data also indicated that activation of the PKA pathway occurred during D-xylonate production, leaving cells unable to adapt normally to stationary phase. This, together with intracellular acidification, probably contributes to cell death. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-763) contains supplementary material, which is available to authorized users

    Re-annotation of the CAZy genes of Trichoderma reesei and transcription in the presence of lignocellulosic substrates

    Get PDF
    BACKGROUND: Trichoderma reesei is a soft rot Ascomycota fungus utilised for industrial production of secreted enzymes, especially lignocellulose degrading enzymes. About 30 carbohydrate active enzymes (CAZymes) of T. reesei have been biochemically characterised. Genome sequencing has revealed a large number of novel candidates for CAZymes, thus increasing the potential for identification of enzymes with novel activities and properties. Plenty of data exists on the carbon source dependent regulation of the characterised hydrolytic genes. However, information on the expression of the novel CAZyme genes, especially on complex biomass material, is very limited. RESULTS: In this study, the CAZyme gene content of the T. reesei genome was updated and the annotations of the genes refined using both computational and manual approaches. Phylogenetic analysis was done to assist the annotation and to identify functionally diversified CAZymes. The analyses identified 201 glycoside hydrolase genes, 22 carbohydrate esterase genes and five polysaccharide lyase genes. Updated or novel functional predictions were assigned to 44 genes, and the phylogenetic analysis indicated further functional diversification within enzyme families or groups of enzymes. GH3 β-glucosidases, GH27 α-galactosidases and GH18 chitinases were especially functionally diverse. The expression of the lignocellulose degrading enzyme system of T. reesei was studied by cultivating the fungus in the presence of different inducing substrates and by subjecting the cultures to transcriptional profiling. The substrates included both defined and complex lignocellulose related materials, such as pretreated bagasse, wheat straw, spruce, xylan, Avicel cellulose and sophorose. The analysis revealed co-regulated groups of CAZyme genes, such as genes induced in all the conditions studied and also genes induced preferentially by a certain set of substrates. CONCLUSIONS: In this study, the CAZyme content of the T. reesei genome was updated, the discrepancies between the different genome versions and published literature were removed and the annotation of many of the genes was refined. Expression analysis of the genes gave information on the enzyme activities potentially induced by the presence of the different substrates. Comparison of the expression profiles of the CAZyme genes under the different conditions identified co-regulated groups of genes, suggesting common regulatory mechanisms for the gene groups

    Whole-genome metabolic model of Trichoderma reesei built by comparative reconstruction

    Get PDF
    Background: Trichoderma reesei is one of the main sources of biomass-hydrolyzing enzymes for the biotechnology industry. There is a need for improving its enzyme production efficiency. The use of metabolic modeling for the simulation and prediction of this organism's metabolism is potentially a valuable tool for improving its capabilities. An accurate metabolic model is needed to perform metabolic modeling analysis. Results: A whole-genome metabolic model of T. reesei has been reconstructed together with metabolic models of 55 related species using the metabolic model reconstruction algorithm CoReCo. The previously published CoReCo method has been improved to obtain better quality models. The main improvements are the creation of a unified database of reactions and compounds and the use of reaction directions as constraints in the gap-filling step of the algorithm. In addition, the biomass composition of T. reesei has been measured experimentally to build and include a specific biomass equation in the model. Conclusions: The improvements presented in this work on the CoReCo pipeline for metabolic model reconstruction resulted in higher-quality metabolic models compared with previous versions. A metabolic model of T. reesei has been created and is publicly available in the BIOMODELS database. The model contains a biomass equation, reaction boundaries and uptake/export reactions which make it ready for simulation. To validate the model, we dem1on-strate that the model is able to predict biomass production accurately and no stoichiometrically infeasible yields are detected. The new T. reesei model is ready to be used for simulations of protein production processes.Peer reviewe

    Evolutionary Conservation of Orthoretroviral Long Terminal Repeats (LTRs) and ab initio Detection of Single LTRs in Genomic Data

    Get PDF
    Background: Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis. Principal Findings: Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs. Conclusion: The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.Peer reviewe

    Real-world treatment outcomes and safety of natalizumab in Finnish multiple sclerosis patients

    Get PDF
    Objectives: The primary objective was to evaluate long-term treatment persistence and safety of natalizumab in Finnish multiple sclerosis patients. The secondary objectives were to assess patient characteristics, use of natalizumab-related safety protocol, and treatment persistence in patients with different anti-John Cunningham virus antibody statuses (John Cunningham virus status). Materials & Methods: All adult multiple sclerosis patients in the Finnish multiple sclerosis register who started natalizumab between 1/2006 and 12/2018 were included in this study and followed retrospectively until treatment discontinuation or end of follow-up (12/2019). Results: In total, 850 patients were included. Median duration of natalizumab treatment was 7.8 years in John Cunningham virus negative (n = 229) and 2.1 years in John Cunningham virus positive patients (n = 115; p < 0.001). The most common cause for treatment discontinuation was John Cunningham virus positivity. After natalizumab discontinuation, patients who had a washout duration of less than 6 weeks had fewer relapses during the first 6 months (p = 0.012) and 12 months (p = 0.005) compared with patients who had a washout duration of over 6 weeks. During the median follow-up of 3.6 years, 76% of patients remained stable or improved on their Expanded Disability Status Scale. Conclusions: Treatment persistence was very high among John Cunningham virus negative patients. The study supports long-term effectiveness of natalizumab and a washout duration of less than 6 weeks after discontinuation.Peer reviewe

    Ikääntyneen, tyypin 2 diabeetikon lääkehoidossa huomioitavia tekijöitä : Opaskortit Raahen seudun hyvinvointikuntayhtymän kotihoitoon

    Get PDF
    Tulevaisuudessa ihmisiä hoidetaan kotona yhä pidempään ja hoidettavana on yhä useammin monisairaita henkilöitä. Raahen kotihoidon alueella on asiakkaana arviolta 500 henkilöä, joista noin kolmanneksella on tyypin 2 diabetes. Tyypin 2 diabeteksen sairaudenkuvaan kuuluvat insuliinin heikentynyt vaikutus (insuliiniresistenssi) sekä häiriintynyt insuliinieritys, joiden seurauksena verensokeri nousee. Ikääntyneillä etenkin tyypin 2 diabetes on haasteellinen, koska sen hoidossa on otettava huomioon muut samanaikaiset sairaudet ja niiden hoito sekä diabeteksen aiheuttamat lisäsairaudet. Tämä tuo haastetta kotihoidon työntekijöille. Opinnäytetyömme tavoitteena on helpottaa Raahen kotihoidon työntekijöitä laatimalla heille opaskortit yli 65-vuotiaiden tyypin 2 diabeetikoiden lääkehoidon tueksi. Opinnäytetyön teoriaosuus on koottu diabetesliitosta, terveyskirjastosta, käypähoito -suosituksista, geriatriaa käsittelevästä kirjallisuudesta sekä muista diabetesta käsittelevistä aineistoista. Opinnäytetyön sisällöstä on keskusteltu geriatri Marja-Liisa Karjulan sekä Raahen diabetespoliklinikan hoitajien kanssa. Opinnäytetyöprosessin tuotteena syntyivät selkeät ja yksinkertaiset opaskortit, joita on helppo kuljettaa mukana. Opaskortteihin on koottu perusasioita yli 65-vuotiaiden tyypin 2 diabetesta sairastavien hoidon toteuttamiseen liittyvistä asioista, joissa on todettu tapahtuvan virheitä. Opaskorteista on helppo tarkistaa hoitokäytäntöjä ja ne ovat tukena myös työntekijöille, jotka työskentelevät tehtävässä vain lyhyen ajan.In the future, people will be treated at home for longer. More and more often People with multiple diseases are treated. There are approximately 500 people in the Raahe home care area and about a third of them have type 2 diabetes. The illness picture of Type 2 diabetes includes impaired insulin resistance (insulin resistance) and disturbed insulin secretion which causes increased blood sugar levels. Type 2 diabetes is challenging with elderly people, because it needs to take into account other concomitant diseases and their treatment as well as additional diabetes-induced diseases. This brings the challenge to home-care workers. The aim of our thesis is to facilitate the Raahe home care workers by providing them a guide cards for type 2 diabetic drug therapist. The theoretical part of the Bachelor 's Thesis is composed from diabetes association, health care book, current therapy recommendations, geriatric literature and other diabetes - related data. The content of the thesis was discussed with the geriatric Marja-Liisa Karjula and the Raahe diabetes clinic nurses. The product of the thesis process was clear and simple guide cards that are easy to carry. Guidebooks contain basic information on issues related to the implementation of care for people over 65 years of age with type 2 diabetes, where mistakes have been detected. It is easy to check out the practices from charts and them also support the workforce who works for a short time only

    Eksploratiivisia menetelmiä genomitiedon analysointiin – sovelluskohteena ihmisen endogeeniset retrovirukset

    No full text
    In this thesis exploratory data analysis methods have been developed for analyzing genomic data, in particular human endogenous retrovirus (HERV) sequences and gene expression data. HERVs are remains of ancient retrovirus infections and now reside within the human genome. Little is known about their functions. However, HERVs have been implicated in some diseases. This thesis provides methods for analyzing the properties and expression patterns of HERVs. Nowadays the genomic data sets are so large that sophisticated data analysis methods are needed in order to uncover interesting structures in the data. The purpose of exploratory methods is to help in generating hypotheses about the properties of the data. For example, by grouping together genes behaving similarly, and hence presumably having similar function, a new function can be suggested for previously uncharacterized genes. The hypotheses generated by exploratory data analysis can be verified later in more detailed studies. In contrast, a detailed analysis of all the genes of an organism would be too time consuming and expensive. In this thesis self-organizing map (SOM) based exploratory data analysis approaches for visualization and grouping of gene expression profiles and HERV sequences are presented. The SOM-based analysis is complemented with estimates on reliability of the SOM visualization display. New measures are developed for estimating the relative reliability of different parts of the visualization. Furthermore, methods for assessing the reliability of groups of samples manually extracted from a visualization display are introduced. Finally, a new computational method is developed for a specific problem in HERV biology. Activities of individual HERV sequences are estimated from a database of expressed sequence tags using a hidden Markov mixture model. The model is used to analyze the activity patterns of HERVs.Väitöskirjassa on kehitetty eksploratiivisia data-analyysimenetelmiä genomiaineistojen analysointiin, keskittyen erityisesti ihmisen endogeenisiin retrovirussekvensseihin ja geeniekspressioaineistoihin. Ihmisen endogeeniset retrovirukset (human endogenous retrovirus, HERV) ovat muinaisten retrovirusinfektioiden jäänteitä ja ovat nyt osa ihmisen genomia. HERV:eistä tiedetään kovin vähän, mutta niille on löytynyt yhteyksiä joihinkin sairauksiin. Tämä työ tarjoaa menetelmiä HERV:ien ominaisuuksien ja aktivoitumisen tutkimiseen. Nykyään genomiaineistot ovat niin suuria, että tarvitaan kehittyneitä data-analyysimenetelmiä datan mielenkiintoisten rakenteiden löytämiseksi. Eksploratiivisten menetelmien tehtävä on auttaa luomaan hypoteeseja datan ominaisuuksista. Esimerkiksi ryhmittelemällä geenit samoin käyttäytyvien, ja oletettavasti saman funktion omaavien, geenien ryhmiin voidaan ehdottaa funktio toiminnaltaan ennestään tuntemattomalle geenille. Eksploratiivisen data-analyysin avulla muodostetut hypoteesit voidaan myöhemmin varmistaa yksityiskohtaisempien kokeiden avulla. Sen sijaan yksityiskohtainen analyysi olisi liian hidasta ja kallista suorittaa kaikille geeneille. Työssä esitetään itseorganisoituvaan karttaan (self-organizing map, SOM) pohjautuvia eksploratiivisia data-analyysimenetelmiä geeniekspressioprofiilien ja ihmisen endogeenisten retrovirussekvenssien visualisointiin ja ryhmittelyyn. SOM-pohjaista lähestymistapaa täydennetään karttavisualisoinnin luotettavuutta arvioivin menetelmin. Uusia mittareita on kehitetty visualisoinnin eri osien suhteellisen luotettavuuden arviointiin. Lisäksi työssä on esitetty menetelmiä, joiden avulla voidaan arvioida käsin kartalta eroteltujen ryhmien luotettavuutta. Työssä on kehitetty uusi laskennallinen menetelmä tietyn HERV:ien biologiaan liittyvän ongelman ratkaisemiseksi. Yksittäisten HERV-sekvenssien aktiivisuustasot pystytään menetelmän avulla estimoimaan ekspressoituneita sekvenssejä listaavista tietokannoista. Uusi menetelmä pohjautuu piilo-Markov-sekoitemalleihin. Työssä sitä käytetään HERV:ien ekspressioprofiilien estimoimisessa ja analysoimisessa.reviewe
    corecore