15,687 research outputs found

    Role of Asparagine as a Nitrogen Signal and Characterization of a Nitrogen Responsive Glutamine Amidotransferase, GAT1_2.1 in Arabidopsis thaliana

    Get PDF
    Maintaining the proper balance between carbon (C) and nitrogen (N) metabolism is critical to the sustained growth of organisms. In plant leaves, this balance is achieved by photoperiod dependent cross-talk between the processes of photosynthesis, respiration, and amino acid metabolism. A crucial mechanism in maintaining C/N balance is the GS/GOGAT cycle, which is well known to serve as a cross-road between C and N metabolism. Importantly, non-photosynthetic tissues (e.g. roots, germinating seeds) lack a sufficient supply of carbon skeletons under high N conditions and hence may resort to other mechanisms, along with the GS/GOGAT cycle, to achieve proper C/N balance. Our understanding of the pathways involved in this aspect of plant regulation is limited. Considering the importance of asparagine as a major storage form of nitrogen, this study examines C and N partitioning within Arabidopsis roots upon asparagine treatment. Based on this work, I propose a role for the enzyme GAT1_2.1 in hydrolyzing excess glutamine to glutamic acid (Glu), which may serve as a carbon skeleton for channeling C to the TCA cycle under high N conditions. GAT1_2.1, a gene coding for a class I glutamine amidotransferase of unknown substrate specificity, was shown to be highly responsive to N status and has a root specific expression in Arabidopsis. The protein localizes to the mitochondria and the gene is found to be highly co-expressed with Glutamate Dehydrogenase 2 (GDH2). Metabolite profiling data using a gat1_2.1 mutant of Arabidopsis suggests that, in the absence of GAT1_2.1, the GABA shunt pathway is activated to replenish the depleted levels of Glu. This Glu may then be deaminated to 2-oxoglutarate by GDH2 and channeled into the TCA cycle, thus providing a cross-roads between C and N metabolism in root mitochondria. In addition to this work, I also elucidate optimal methods for reliable metabolomics experiments and propose the use of isotopic labelling for the detection of unknown pathways

    The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

    Get PDF
    One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

    A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences.</p> <p>Results</p> <p>In this paper, a novel building block of proteins called Top-<it>n</it>-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-<it>n</it>-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-<it>n</it>-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-<it>n</it>-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-<it>n</it>-grams and LSA gives significantly better results compared to related methods.</p> <p>Conclusion</p> <p>The method based on Top-<it>n</it>-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-<it>n</it>-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.</p

    Integrated mining of feature spaces for bioinformatics domain discovery

    Get PDF
    One of the major challenges in the field of bioinformatics is the elucidation of protein folding for the functional annotation of proteins. The factors that govern protein folding include the chemical, physical, and environmental conditions of the protein\u27s surroundings, which can be measured and exploited for computational discovery purposes. These conditions enable the protein to transform from a sequence of amino acids to a globular three-dimensional structure. Information concerning the folded state of a protein has significant potential to explain biochemical pathways and their involvement in disorders and diseases. This information impacts the ways in which genetic diseases are characterized and cured and in which designer drugs are created. With the exponential growth of protein databases and the limitations of experimental protein structure determination, sophisticated computational methods have been developed and applied to search for, detect, and compare protein homology. Most computational tools developed for protein structure prediction are primarily based on sequence similarity searches. These approaches have improved the prediction accuracy of high sequence similarity proteins but have failed to perform well with proteins of low sequence similarity. Data mining offers unique algorithmic computational approaches that have been used widely in the development of automatic protein structure classification and prediction. In this dissertation, we present a novel approach for the integration of physico-chemical properties and effective feature extraction techniques for the classification of proteins. Our approaches overcome one of the major obstacles of data mining in protein databases, the encapsulation of different hydrophobicity residue properties into a much reduced feature space that possess high degrees of specificity and sensitivity in protein structure classification. We have developed three unique computational algorithms for coherent feature extraction on selected scale properties of the protein sequence. When plagued by the problem of the unequal cardinality of proteins, our proposed integration scheme effectively handles the varied sizes of proteins and scales well with increasing dimensionality of these sequences. We also detail a two-fold methodology for protein functional annotation. First, we exhibit our success in creating an algorithm that provides a means to integrate multiple physico-chemical properties in the form of a multi-layered abstract feature space, with each layer corresponding to a physico-chemical property. Second, we discuss a wavelet-based segmentation approach that efficiently detects regions of property conservation across all layers of the created feature space. Finally, we present a unique graph-theory based algorithmic framework for the identification of conserved hydrophobic residue interaction patterns using identified scales of hydrophobicity. We report that these discriminatory features are specific to a family of proteins, which consist of conserved hydrophobic residues that are then used for structural classification. We also present our rigorously tested validation schemes, which report significant degrees of accuracy to show that homologous proteins exhibit the conservation of physico-chemical properties along the protein backbone. We conclude our discussion by summarizing our results and contributions and by listing our goals for future research

    A computational intelligence analysis of G proteincoupled receptor sequinces for pharmacoproteomic applications

    Get PDF
    Arguably, drug research has contributed more to the progress of medicine during the past decades than any other scientific factor. One of the main areas of drug research is related to the analysis of proteins. The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. This dependency brings about the challenge of finding robust methods to analyze the complex data they generate. Such challenge invites us to go one step further than traditional statistics and resort to approaches under the conceptual umbrella of artificial intelligence, including machine learning (ML), statistical pattern recognition and soft computing methods. Sound statistical principles are essential to trust the evidence base built through the use of such approaches. Statistical ML methods are thus at the core of the current thesis. More than 50% of drugs currently available target only four key protein families, from which almost a 30% correspond to the G Protein-Coupled Receptors (GPCR) superfamily. This superfamily regulates the function of most cells in living organisms and is at the centre of the investigations reported in the current thesis. No much is known about the 3D structure of these proteins. Fortunately, plenty of information regarding their amino acid sequences is readily available. The automatic grouping and classification of GPCRs into families and these into subtypes based on sequence analysis may significantly contribute to ascertain the pharmaceutically relevant properties of this protein superfamily. There is no biologically-relevant manner of representing the symbolic sequences describing proteins using real-valued vectors. This does not preclude the possibility of analyzing them using principled methods. These may come, amongst others, from the field of statisticalML. Particularly, kernel methods can be used to this purpose. Moreover, the visualization of high-dimensional protein sequence data can be a key exploratory tool for finding meaningful information that might be obscured by their intrinsic complexity. That is why the objective of the research described in this thesis is twofold: first, the design of adequate visualization-oriented artificial intelligence-based methods for the analysis of GPCR sequential data, and second, the application of the developed methods in relevant pharmacoproteomic problems such as GPCR subtyping and protein alignment-free analysis.Se podría decir que la investigación farmacológica ha desempeñado un papel predominante en el avance de la medicina a lo largo de las últimas décadas. Una de las áreas principales de investigación farmacológica es la relacionada con el estudio de proteínas. La farmacología depende cada vez más de los avances en genómica y proteómica, lo que conlleva el reto de diseñar métodos robustos para el análisis de los datos complejos que generan. Tal reto nos incita a ir más allá de la estadística tradicional para recurrir a enfoques dentro del campo de la inteligencia artificial, incluyendo el aprendizaje automático y el reconocimiento de patrones estadístico, entre otros. El uso de principios sólidos de teoría estadística es esencial para confiar en la base de evidencia obtenida mediante estos enfoques. Los métodos de aprendizaje automático estadístico son uno de los fundamentos de esta tesis. Más del 50% de los fármacos en uso hoy en día tienen como ¿diana¿ apenas cuatro familias clave de proteínas, de las que un 30% corresponden a la super-familia de los G-Protein Coupled Receptors (GPCR). Los GPCR regulan la funcionalidad de la mayoría de las células y son el objetivo central de la tesis. Se desconoce la estructura 3D de la mayoría de estas proteínas, pero, en cambio, hay mucha información disponible de sus secuencias de amino ácidos. El agrupamiento y clasificación automáticos de los GPCR en familias, y de éstas a su vez en subtipos, en base a sus secuencias, pueden contribuir de forma significativa a dilucidar aquellas de sus propiedades de interés farmacológico. No hay forma biológicamente relevante de representar las secuencias simbólicas de las proteínas mediante vectores reales. Esto no impide que se puedan analizar con métodos adecuados. Entre estos se cuentan las técnicas provenientes del aprendizaje automático estadístico y, en particular, los métodos kernel. Por otro lado, la visualización de secuencias de proteínas de alta dimensionalidad puede ser una herramienta clave para la exploración y análisis de las mismas. Es por ello que el objetivo central de la investigación descrita en esta tesis se puede desdoblar en dos grandes líneas: primero, el diseño de métodos centrados en la visualización y basados en la inteligencia artificial para el análisis de los datos secuenciales correspondientes a los GPCRs y, segundo, la aplicación de los métodos desarrollados a problemas de farmacoproteómica tales como la subtipificación de GPCRs y el análisis de proteinas no-alineadas

    Identification of the Schistosoma mansoni TNF-Alpha Receptor Gene and the Effect of Human TNF-Alpha on the Parasite Gene Expression Profile

    Get PDF
    Schistosoma mansoni is the major causative agent of schistosomiasis in the Americas. This parasite takes advantage of host signaling molecules such as cytokines and hormones to complete its development inside the host. Tumor necrosis factor-alpha (TNF-α) is one of the most important host cytokines involved in the inflammatory response. When cercariae, the infective stage, penetrates the human skin the release of TNF-α is started. In this work the authors describe the complete sequence of a possible TNF-α receptor in S. mansoni and detect that the receptor is most highly expressed in cercariae among all life cycle stages. Aiming to mimic the situation at the site of skin penetration, cercariae were mechanically transformed in vitro into schistosomula and exposed to human TNF-α. Exposure of early-developing schistosomula to the human hormone caused a large-scale change in the expression of parasite genes. Exposure of adult worms to human TNF-α caused gene expression changes as well, and the set of parasite altered genes in the adult parasite was different from that of schistosomula. This work increases the number of known signaling pathways of the parasite, and opens new perspectives into understanding the molecular components of TNF-α response as well as into possibly interfering with parasite–host interaction

    Quantitative and functional post-translational modification proteomics reveals that TREPH1 plays a role in plant thigmomorphogenesis

    Full text link
    Plants can sense both intracellular and extracellular mechanical forces and can respond through morphological changes. The signaling components responsible for mechanotransduction of the touch response are largely unknown. Here, we performed a high-throughput SILIA (stable isotope labeling in Arabidopsis)-based quantitative phosphoproteomics analysis to profile changes in protein phosphorylation resulting from 40 seconds of force stimulation in Arabidopsis thaliana. Of the 24 touch-responsive phosphopeptides identified, many were derived from kinases, phosphatases, cytoskeleton proteins, membrane proteins and ion transporters. TOUCH-REGULATED PHOSPHOPROTEIN1 (TREPH1) and MAP KINASE KINASE 2 (MKK2) and/or MKK1 became rapidly phosphorylated in touch-stimulated plants. Both TREPH1 and MKK2 are required for touch-induced delayed flowering, a major component of thigmomorphogenesis. The treph1-1 and mkk2 mutants also exhibited defects in touch-inducible gene expression. A non-phosphorylatable site-specific isoform of TREPH1 (S625A) failed to restore touch-induced flowering delay of treph1-1, indicating the necessity of S625 for TREPH1 function and providing evidence consistent with the possible functional relevance of the touch-regulated TREPH1 phosphorylation. Bioinformatic analysis and biochemical subcellular fractionation of TREPH1 protein indicate that it is a soluble protein. Altogether, these findings identify new protein players in Arabidopsis thigmomorphogenesis regulation, suggesting that protein phosphorylation may play a critical role in plant force responses

    Characterizing the transcriptional regulation of crassulacean acid metabolism in Kalanchoe

    Get PDF
    Due to the agricultural challenges posed by the prospect of a hotter drier climate understanding the molecular basis of plant water-use efficiency is of increasing importance. Species performing crassulacean acid metabolism (CAM) photosynthesis have evolved to be naturally water-use efficient primarily through shifting their carbon uptake to night to minimize water-loss. Relative to C3 and C4 photosynthesis species, CAM plants are enriched for rhythmic circadian clock-dependent regulation of metabolic processes. However, the transcriptional regulation of CAM remains largely uncharacterized. Using Kalanchoe fedtschenkoi, in which CAM develops along a leaf developmental gradient, candidate transcription factors with possible CAM-related functions were identified. The mRNA abundance of these transcription factors increases upon the transition from C3 photosynthesis to CAM and they appear to exhibit a circadian phase-dependent pattern of regulation. To better characterize the transcriptional control circuits underlying CAM, three such of these transcription factors, KfNF-YB3, KfHomeodomain-like, and KfMYB59 were selected for chromatin immunoprecipitation-sequencing (ChIP-seq). However, these experiments failed to identify enriched target genomic loci possibly as a consequence of the unique challenges of adapting experimental protocols designed for model C3 photosynthesis plant species to a succulent plant such as Kalanchoe. Additionally, this work focuses on elucidating the cis-regulatory elements and the trans-acting factors governing the transcriptional control of the phosphoenolpyruvate carboxylase gene (Ppc1) in Kalanchoe. Despite this enzyme’s importance in catalyzing the primary nocturnal fixation of CO2 in CAM species, the complex regulatory mechanisms underlying its expression are not well-studied. We examined the Kalanchoe Ppc1 promoter and identified numerous cis-regulatory elements on the basis of their sequence conservation with known regulatory modules. These individual elements along with two-hundred base pair region segments of the Kalanchoe Ppc1 promoter were used at bait probes in yeast one-hybrid (Y1H) assays. From this analysis, several high-confidence interacting transcriptional regulators were identified including ERF9, ERF106, TCP4, and PIF1. In silico examination of the Ppc1 promoter revealed likely binding sites for these factors based on homology to validated preferred binding sequences in Arabidopsis. The specific transcription factors identified through this work can now serve as the basis for further experiments to confirm interaction with the Ppc1 promoter and elucidate the nature of their regulatory effects. Overall, the work presented in this dissertation attempts to investigate the transcriptional control of crassulacean acid metabolism using the developmental CAM model Kalanchoe
    • …
    corecore