44 research outputs found

    A new computational framework for the classification and function prediction of long non-coding RNAs

    Get PDF
    Long non-coding RNAs (lncRNAs) are known to play a significant role in several biological processes. These RNAs possess sequence length greater than 200 base pairs (bp), and so are often misclassified as protein-coding genes. Most Coding Potential Computation (CPC) tools fail to accurately identify, classify and predict the biological functions of lncRNAs in plant genomes, due to previous research being limited to mammalian genomes. In this thesis, an investigation and extraction of various sequence and codon-bias features for identification of lncRNA sequences has been carried out, to develop a new CPC Framework. For identification of essential features, the framework implements regularisation-based selection. A novel classification algorithm is implemented, which removes the dependency on experimental datasets and provides a coordinate-based solution for sub-classification of lncRNAs. For imputing the lncRNA functions, lncRNA-protein interactions have been first determined through co-expression of genes which were re-analysed by a sequence similaritybased approach for identification of novel interactions and prediction of lncRNA functions in the genome. This integrates a D3-based application for visualisation of lncRNA sequences and their associated functions in the genome. Standard evaluation metrics such as accuracy, sensitivity, and specificity have been used for benchmarking the performance of the framework against leading CPC tools. Case study analyses were conducted with plant RNA-seq datasets for evaluating the effectiveness of the framework using a cross-validation approach. The tests show the framework can provide significant improvements on existing CPC models for plant genomes: 20-40% greater accuracy. Function prediction analysis demonstrates results are consistent with the experimentally-published findings

    Shedding some light over the floral metabolism by Arum Lily (Zantedeschia aethiopica) Spathe de novo transcriptome assembly

    Get PDF
    Zantedeschia aethiopica is an evergreen perennial plant cultivated worldwide and commonly used for ornamental and medicinal purposes including the treatment of bacterial infections. However, the current understanding of molecular and physiological mechanisms in this plant is limited, in comparison to other non-model plants. In order to improve understanding of the biology of this botanical species, RNA-Seq technology was used for transcriptome assembly and characterization. Following Z. aethiopica spathe tissue RNA extraction, high-throughput RNA sequencing was performed with the aim of obtaining both abundant and rare transcript data. Functional profiling based on KEGG Orthology (KO) analysis highlighted contigs that were involved predominantly in genetic information (37%) and metabolism (34%) processes. Predicted proteins involved in the plant circadian system, hormone signal transduction, secondary metabolism and basal immunity are described here. In silico screening of the transcriptome data set for antimicrobial peptide (AMP) – encoding sequences was also carried out and three lipid transfer proteins (LTP) were identified as potential AMPs involved in plant defense. Spathe predicted protein maps were drawn, and suggested that major plant efforts are expended in guaranteeing the maintenance of cell homeostasis, characterized by high investment in carbohydrate, amino acid and energy metabolism as well as in genetic information

    Machine learning methods for omics data integration

    Get PDF
    High-throughput technologies produce genome-scale transcriptomic and metabolomic (omics) datasets that allow for the system-level studies of complex biological processes. The limitation lies in the small number of samples versus the larger number of features represented in these datasets. Machine learning methods can help integrate these large-scale omics datasets and identify key features from each dataset. A novel class dependent feature selection method integrates the F statistic, maximum relevance binary particle swarm optimization (MRBPSO), and class dependent multi-category classification (CDMC) system. A set of highly differentially expressed genes are pre-selected using the F statistic as a filter for each dataset. MRBPSO and CDMC function as a wrapper to select desirable feature subsets for each class and classify the samples using those chosen class-dependent feature subsets. The results indicate that the class-dependent approaches can effectively identify unique biomarkers for each cancer type and improve classification accuracy compared to class independent feature selection methods. The integration of transcriptomics and metabolomics data is based on a classification framework. Compared to principal component analysis and non-negative matrix factorization based integration approaches, our proposed method achieves 20-30% higher prediction accuracies on Arabidopsis tissue development data. Metabolite-predictive genes and gene-predictive metabolites are selected from transcriptomic and metabolomic data respectively. The constructed gene-metabolite correlation network can infer the functions of unknown genes and metabolites. Tissue-specific genes and metabolites are identified by the class-dependent feature selection method. Evidence from subcellular locations, gene ontology, and biochemical pathways support the involvement of these entities in different developmental stages and tissues in Arabidopsis

    Identification And Functional Characterization Of Plant Small Secreted Proteins During Arbuscular Mycorrhizal Symbiosis

    Get PDF
    Plant small secreted proteins (SSPs) are sequences of 50 – 250 amino acids in size which are transported out of cells to fulfill multiple functions related to plant growth and development and response to various stresses. With the development of more accurate and affordable genome sequencing technology, an increasing number of SSPs have been predicted using diverse computational tools based on machine learning. Although experimentally validated plant SSPs are still limited, some studies have reported that plant SSPs can be induced and involved in mutualistic relationships between plants and microbes. In Chapter I, known SSPs and their functions in various plant species are reviewed. Additionally, current computational tools and experimental methods that have been widely applied to identify plant SSPs are summarized. A new, robust, and integrated pipeline to discover plant SSPs is proposed. Furthermore, strategies for elucidating the biological functions of SSPs in plants are discussed in Chapter I. Chapter II presents predicted SSPs from 60 plant species and elucidates the evolutionary convergence of changes in SSP sequences. Furthermore, the expression of SSPs induced by arbuscular mycorrhizal fungi (AMF) which correspond to the convergent abilityfor different plants to form mutualistic association with AMF are explored. Overall, this study provides insightful ideas to understand functions of plant SSPs that occur during symbiosis between plants and fungi

    A modeling platform to predict cancer survival and therapy outcomes using tumor tissue derived metabolomics data.

    Get PDF
    Cancer is a complex and broad disease that is challenging to treat, partially due to the vast molecular heterogeneity among patients even within the same subtype. Currently, no reliable method exists to determine which potential first-line therapy would be most effective for a specific patient, as randomized clinical trials have concluded that no single regimen may be significantly more effective than others. One ongoing challenge in the field of oncology is the search for personalization of cancer treatment based on patient data. With an interdisciplinary approach, we show that tumor-tissue derived metabolomics data is capable of predicting clinical response to systemic therapy classified as disease control vs. progressive disease and pathological stage classified as stage I/II/III vs. stage IV via data analysis with machine-learning techniques (AUROC = 0.970; AUROC=0.902). Patient survival was also analyzed via statistical methods and machine-learning, both of which show that tumor-tissue derived metabolomics data is capable of risk stratifying patients in terms of long vs. short survival (OS AUROC = 0.940TEST; PFS AUROC = 0.875TEST). A set of key metabolites as potential biomarkers and associated metabolic pathways were also found for each outcome, which may lead to insight into biological mechanisms. Additionally, we developed a methodology to calibrate tumor growth related parameters in a well-established mathematical model of cancer to help predict the potential nuances of chemotherapeutic response. The proposed methodology shows results consistent with clinical observations in predicting individual patient response to systemic therapy and helps lay the foundation for further investigation into the calibration of mathematical models of cancer with patient-tissue derived molecular data. Chapters 6 and 8 were published in the Annals of Biomedical Engineering. Chapters 2, 3, and 7 were published in Metabolomics, Lung Cancer, and Pharmaceutical Research, respectively. Chapters 4 has been accepted for publication at the journal Metabolomics (in press) and Chapter 5 is in review at the journal Metabolomics. Chapter 9 is currently undergoing preparation for submission

    Role of phytoplasma effector proteins in plant development and plant-insect interactions

    Get PDF
    Phytoplasmas are insect-transmitted plant pathogenic bacteria that dramatically alter plant development. Phytoplasma virulence protein (effector) SAP54 mediates degradation of host MADS-box transcription factors (MTFs) via 26S proteasome shuttle protein RAD23 to abolish normal flower development and produce leaf-like flowers (phyllody). Phyllodies are common symptoms in phytoplasma-infected plants worldwide. Why do phytoplasmas degrade MTFs and induce phyllody? Are changes in host plant morphology adaptive and benefit phytoplasma spread? Because phytoplasmas rely on their insect (leafhopper) vectors for transmission from plant to plant, I hypothesized that the vegetative tissues of the leaf-like flowers render plants more attractive to the insect vectors that will aid phytoplasma dispersal in nature. I discovered that the induction of phyllody is genetically linked with enhanced insect egg-laying preference on the infected plants that exhibit the leaflike flower phenotype. However, SAP54 enhances insect colonisation of plants independently from floral transition and the changes in plant morphology. Interestingly, male leafhoppers are required for the preference of females to lay eggs on SAP54 plants. Moreover, SAP54 suppresses insect induced plant responses in sex-specific manner by selectively downregulating male-induced defence and secondary metabolism pathways. Furthermore, I identified four MTFs that are expressed in plant leaves and play important roles in egg-laying preferences by leafhoppers and demonstrate sex-specific regulation by SAP54. Taken together, phytoplasma effector SAP54 enhances insect vector colonisation of plants by suppression of insect-induced plant responses independent of developmental changes. This is likely to occur by targeting MTFs – a conserved regulators of both plant development as well as plant defence against herbivorous insects. In addition to developmental changes, degradation of MTFs by SAP54 may result in modulation of male-induced plant responses to attract female insects for egg-laying and aid phytoplasma spread in nature

    Predicting plant environmental exposure using remote sensing

    Get PDF
    Wheat is one of the most important crops globally with 776.4 million tonnes produced in 2019 alone. However, 10% of all wheat yield is predicted to be lost to Septoria Tritici Blotch (STB) caused by Zymoseptoria tritici (Z. tritici). Throughout Europe farmers spend ÂŁ0.9 billion annually on preventative fungicide regimes to protect wheat against Z. tritici. A preventative fungicide regime is used as Z. tritici has a 9-16 day asymptomatic latent phase which makes it difficult to detect before symptoms develop, after which point fungicide intervention is ineffective. In the second chapter of my thesis I use hyperspectral sensing and imaging techniques, analysed with machine learning to detect and predict symptomatic Z. tritici infection in winter wheat, in UK based field trials, with high accuracy. This has the potential to improve detection and monitoring of symptomatic Z. tritici infection and could facilitate precision agriculture methods, to use in the subsequent growing season, that optimise fungicide use and increase yield. In the third chapter of my thesis, I develop a multispectral imaging system which can detect and utilise none visible shifts in plant leaf reflectance to distinguish plants based on the nitrogen source applied. Currently, plants are treated with nitrogen sources to increase growth and yield, the most common being calcium ammonium nitrate. However, some nitrogen sources are used in illicit activities. Ammonium nitrate is used in explosive manufacture and ammonium sulphate in the cultivation and extraction of the narcotic cocaine from Erythroxylum spp. In my third chapter I show that hyperspectral sensing, multispectral imaging, and machine learning image analysis can be used to visualise and differentiate plants exposed to different nefarious nitrogen sources. Metabolomic analysis of leaves from plants exposed to different nitrogen sources reveals shifts in colourful metabolites that may contribute to altered reflectance signatures. This suggests that different nitrogen feeding regimes alter plant secondary metabolism leading to changes in plant leaf reflectance detectable via machine learning of multispectral data but not the naked eye. These results could facilitate the development of technologies to monitor illegal activities involving various nitrogen sources and further inform nitrogen application requirements in agriculture. In my fourth chapter I implement and adapt the hyperspectral sensing, multispectral imaging and machine learning image analysis developed in the third chapter to detect asymptomatic (and symptomatic) Z. tritici infection in winter wheat, in UK based field trials, with high accuracy. This has the potential to improve detection and monitoring of all stages of Z. tritici infection and could facilitate precision agriculture methods to be used during the current growing season that optimise fungicide use and increase yield.Open Acces

    Characterization of PRL1 and its paralogue PRL2 in Arabidopsis thaliana

    Get PDF
    Das Arabidopsis-Gen PLEIOTROPIC REGULATORY LOCUS 1 (PRL1) kodiert für ein Protein, das ein Mitglied einer konservierten WDR-Protein-Familie in Eukaryoten ist. PRL1-Mutationen führen zu Zucker-Überempfindlichkeit, verursachen zahlreiche pleiotrope Veränderungen bei der Wurzel-, Blatt- und Blüten-entwicklung und lösen Antworten auf abiotische und biotische Stress-Reize aus, indem sie die Pflanzen-hormon-Homöostase verändern. Im Gegensatz zu anderen Eukaryoten enthält das Arabidopsis-Genom mit PRL2 ein PRL1-Paralog, das mit PRL1 eine hochkonservierte C-terminale WDR-Domäne teilt. Verglichen mit PRL1 besitzt PRL2 jedoch unterschiedliche N-terminale Sequenzen. Nichtsdestotrotz weisen die N-terminalen Bereiche von PRL1 und PRL2 eine für die Evolution im Pflanzenreich außergewöhnlich hochkonservierte Struktur auf. Untersuchungen an Hefen und Säugetieren zeigten, dass PRL1 eine Schlüsselrolle bei der Aktivierung des Spliceosoms spielt. Dennoch impliziert die Divergenz der N-terminalen PRL1-Region in Arabidopsis pflanzenspezifische Funktionen beim Spleißen. Ein Ziel dieser Arbeit war es, Einblicke in regulatorische Funktionen von PRL1 zu ermöglichen. Affymetrix ATH1-Arrays und Tiling-Arrays bestätigten zusammen mit der RNA-Seq-Analyse die im gesamten Genom vorhandenen Veränderungen der prl1 Mutante. Darüber hinaus legt die Hochregulierung der Transposons zusammen mit der Hochregulierung von vielen weiteren Genen nahe, dass PRL1 ebenfalls an der Kontrolle des Gen-Silencing beteiligt ist. Zudem konnte durch die Verwendung von veränderten PRL1-Konstrukten gezeigt werden, dass PRL1-Transkription und PRL1-Stabilität stressabhängig sind und dass PRL1 nicht in die Proteasom-Aktivität eingreift. Ein weiteres Ziel war, das Ausmaß der funktionellen Übereinstimmung von PRL1 und PRL2 zu beurteilen. Ein Transkriptionsvergleich von PRL1 und PRL2 ergab, dass PRL2 in den meisten Gewebeteilen auf einem signifikant niedrigeren Niveau transkribiert wird. So wird PRL2 in Blüten speziell im männlichen Reproduktionsgewebe, in Pollenkörnern, im Endosperm und in der Zygote exprimiert, wohingegen PRL1 in sich entwickelnden Samenanlagen und im Integument der sich entwickelnden Samen aktiv ist. Die prl2-Mutationen sind embryoletal, während somatische prl2-Mosaike einen prl1-ähnlichen Phänotyp zeigen. Durch den Austausch der unterschiedlichen N-terminalen Domänen zwischen PRL1 und PRL2 konnte die Subfunktionalisation von PRL1 und PRL2 gezeigt werden. Die N-terminale Domäne von PRL2 komplementiert den prl1-Wurzelphänotyp nur teilweise. Sie kann jedoch die von prl2 verursachte embryonale Letalität wiederherstellen. Zudem ersetzt die N-terminale PRL1-Domäne zusammen mit der C-terminalen PRL2-Domäne den prl1-Phänotyp, aber nicht den embryoletalen prl2-Phänotyp. Zusätzlich zu der unterschiedlichen Transkription, die man bei PRL1 und PRL2 beobachtet , hat die Diversifikation der N-terminalen Domänen zu der Subfunktionalisation von PRL1 und PRL2 in Arabidopsis beigetragen. Die in dieser Arbeit vorgelegten Ergebnisse weisen zudem darauf hin, dass PRL1 und PRL2 eine spezielle Funktion bei der Kontrolle des Gen-Silencing in Arabidopsis spielen

    Mathematical modelling of SERK mediated BR signalling

    Get PDF
    Being sessile by nature plants are continuously challenged by biotic and abiotic stress factors. At the cellular level, different stimuli are perceived and translated to the desired response. In order to achieve this, signal transduction cascades have to be interlinked. Complex networks of downstream targets as well as positive and negative regulatory elements are essential for proper signal transduction. This often complicates analysis of signal transduction cascades via genetic approaches as a mutation in one gene results in a pleiotropic phenotype. Pathway components can be placed in the signal transduction cascade based on genetics as well as biochemical interactions between proteins. This results in a signal transduction network which is based on Boolean logics; either the gene is there and is functional (on) or it’s mutated and not functional (off). In such a genetic scheme, intermediate conditions and the effect of concentration on pathway components is not taken into account. Also, the effect of intermediate or transient activation states on signal transduction pathways is rarely included. In principle all proteins in a signal transduction network obey mass action laws suggesting that reaction rate and as a consequence the output of the signal depends on the concentration of the reactants. In addition, signals can be subjected to negative feedback thereby resulting in a signal that attenuates itself to maintain cellular homeostasis, or only responds to a stimulus temporarily and only when required. At the cellular level, the cell has to decipher all stimuli to enable the desired integrated cellular response. In this respect, concentration or amplitudes matter very much as the plant must not respond to background noise. Hence a cellular response will only be induced when a signal is above a certain threshold resulting in “switch like” behaviour of the system. Mathematical modelling can help to visualise and explain the temporal and concentration effects of pathway components on the output of signal transduction cascades. In order to do so, the signal transduction cascade needs to be well described with a clear and measurable response. Obviously, to know how receptor concentration affects the signalling output one has to know its numbers, and this was the starting point of the work described in this thesis. The final challenge was to describe the modulatory effect of SERK co-receptors on BR signalling. For this, SERK mediated BRI1 signalling was incorporated in a mathematical model that describes root growth and hypocotyl elongation based on the BRI1 receptor activity. In Chapter 1 the brassinosteroid signalling pathway as well as the role of SERK co-receptors on BRI1 mediated signalling is described. BRs are perceived by the plasma membrane localised Brassinosteroid insensitive 1 (BRI1) receptor. For its signalling, BRI1 completely depends on the presence of non-ligand binding co-receptors of the somatic embryogenesis receptor like kinase (SERK) co-receptor family. An added complexity is that SERK co-receptors associate with different main ligand perceiving receptors thereby affecting multiple signalling pathways simultaneously. Therefore, it is important to know how SERK co-receptors modulate the output of the main ligand perceiving receptor and how SERK co-receptors are distributed between the signal transduction cascades. The BRI1 signal transduction pathway is one of the best understood signal transduction cascades in Arabidopsis with clearly described ligands and associated phenotypes. For this reason, the focus of this study was on how SERK co-receptors affect BRI1 mediated signalling quantitatively using a mathematical modelling approach. This requires knowledge on the concentration of the main ligand perceiving receptor, SERK co-receptor and ligand levels. Since the BRI1 and SERK co-receptor concentration was unknown we set out to quantify the number of receptors in a cell. In Chapter 2 a confocal microscopy based method is described that enables quantification of BRI1, SERK1 and SERK3 in planta. The number of BRI1 receptor molecules in root epidermal cells ranges from 22,000 in the meristem to 80,000 in the maturation zone. However, when taking into account differences in cell size, the root meristem cells have the same receptor density which reduces significantly in the maturation zone. The root meristem cells are thought to be most active in BR signalling, suggesting that receptor density rather than total number of BRI1 receptors affects the sensitivity of a cell for BRs. The next question is, how the physiological response of the cell depends on both ligand stimulation of the receptor and on ligand concentration. To address this, a mathematical modelling approach was employed where the receptor - ligand concentrations were coupled to root growth and hypocotyl elongation as a downstream physiological readout for BR signalling (Chapter 3). Based on the BRI1 receptor activity the model faithfully predicts root growth as observed in bri1 loss-of-function mutants. The model also predicts that a rather low number of receptor molecules are needed to initiate a physiological response. Interestingly, the “switch” between activation and inhibition of root growth depends on the BRI1 occupancy level. This suggests that BRI1 may be a core regulator based on activating different targets based on its occupancy level. Root growth is robust against reduction in the BRI1 receptor level but not to variation in the BR concentration. This indicates that BR signalling is mainly regulated via ligand availability and biochemical activity. Since BRI1 signalling is highly dependent on the presence of SERK co-receptors, it is important to determine how these co-receptors affect the signalling output. Therefore, in Chapter 4, the BRI1 receptor model was extended with two co-receptors, SERK1 and SERK3. The model also takes into account BRI1 signalling independent of SERK1 and SERK3. This may occur due the activity of BRI1 alone, or due to interaction of BRI1 with another co-receptor, for example SERK4. It appears that roots of the serk1serk3 double mutant are almost completely irresponsive for BRs while the hypocotyl is not, suggesting either a difference in co-receptor usage or a higher activity of BRI1 alone in the hypocotyl. The usage of different co-receptors may reflect a mechanism by which the sensitivity of a cell for BRs is regulated. It appears that co-receptors mainly act by increasing the magnitude of the response. In addition, in silico simulations confirm that BRI1 signalling is not impaired when the majority of SERK co-receptors operate in other signalling pathways. The presented model provides a starting point to incorporate the effect of other modulators of the BRI1 signal transduction cascade on a complex physiological response. Current models for BRI1 mediated signalling postulate that SERK3 is recruited upon ligand binding. However, Fluorescence Recovery After Photo bleaching (FRAP) measurements described in Chapter 5, indicate that BRI1 receptors located in root meristem cells have a relatively low mobility. This suggests that BRI1 and SERK already form complexes in the absence of ligand. It has been repeatedly reported that SERK co-receptors are involved in various biological processes and signal transduction networks. In Chapter 6, the changes in gene expression in absence of functional SERK1 and SERK3 are studied using transcriptional analysis. Microarrays were performed on RNA isolated from roots of 4-day-old seedlings of serk1, serk3 and serk1serk3 mutants. Hierarchical cluster analysis indicated that serk3 mutant roots have the same transcriptional pattern when compared to roots of the serk1serk3 double mutant but to a lower magnitude. More than half of the genes differentially regulated in the serk1serk3 double mutant relate to BRI1 mediated signalling. In addition, a number of BR dependent and independent metabolic processes are affected in absence of SERK3 indicating that this co-receptor may have an additional function in metabolic control. Performing microarray analysis on receptor mutants is complicated as effects on gene transcripts may be indirect and due to differential regulation of downstream transcriptional regulators. This complexity is further enhanced in the SERK co-receptor mutants as multiple signalling pathways are affected. This raises the question if it is truly possible to correlate alterations in gene expression due to the absence of functional SERK co-receptors to one particular signal transduction pathway. In Chapter 7, the general discussion, it is described how modelling of BRI1 signalling in this thesis has contributed to new insights into the brassinosteroid signalling. Microscopy has been an important tool to quantify the number of receptors in a cell or the number of cells in a tissue. What is still needed is a clear link between a signalling activity, and, therefore, the physiological response of the cell, to local and intracellular protein-protein interactions and protein concentrations. Further expanding the available microscopic techniques and mathematical models to the cellular level is one of the next challenges. The research described in this thesis is a starting point for such an approach to study signal transduction in Arabidopsis.</p

    Tissue and population-level diversity in plant secondary metabolism: a systematic exploration using MS/MS structural analysis

    Get PDF
    Plants are amazing synthetic chemists that create diversified secondary metabolites which play myriad ecological roles for their survival and reproductive fitness in nature. The structural complexity of secondary metabolism has severely hampered its functional analysis. The potential of MS-based metabolomics and of the large-scale acquisition of tandem MS (MS/MS) spectra is limited by the absence of straightforward classification and visualization pipelines so that secondary metabolite and the underlying pathway interpretations can be easily made. From a mechanistic standpoint, secondary metabolism diversity attributes to the occurrence of multiplicity of genes in plant genomes. Yet the majorities of metabolic gene functions remain however unknown. In this thesis, I developed a workflow to systematically explore the diversity of secondary metabolism in Nicotiana attenuata – a metabolically rich ecological model plant. I first characterize the metabolic space of this model plant using the large-scale acquisition of MS/MS spectral information in a data-independent manner and the computational re-assembly of non-redundant MS/MS spectra. The resulting MS signatures were then aligned and visualized to rapidly formulate structural hypotheses. Using natural variation, I examined the correlations among jasmonate signaling and large-scale defense metabolism. The resulting correlation maps uncovered new metabolic layers in a plant’s jasmonate-mediated defensive arsenal. In the tissue-level exploration of secondary metabolite diversity, Transciptomic and metabolomic information and their variance as analyzed by information theory were used for the predictions of tissue-specific function of genes responsible for the metabolic signatures
    corecore