52 research outputs found

    Identification of proteins similar to AvrE type III effector proteins from Arabidopsidis thaliana genome with partial least squares

    Get PDF
    Type III effector proteins are injected into host cells through type III secretion systems. Some effectors are similar to host proteins to promote pathogenicity, while others lead to the activation of disease resistance. We used partial least squares alignment-free bioinformatics methods to identify proteins similar to AvrE proteins from Arabidopsidis thaliana genome and identified 61 protein candidates. Using information from Genevestigator, Arabidopsidis GEB, KEGG, (GEO: accession number GSE22274), and AraCyc databases, we highlighted 16 protein candidates from Arabidopsidis genome for further investigation.Keywords: Partial least squares, Type III effectors, AvrE, and ArabidopsisAfrican Journal of Biotechnology Vol. 12(39), pp. 5804-580

    Evolution of the Kdo2-lipid A biosynthesis in bacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Lipid A is the highly immunoreactive endotoxic center of lipopolysaccharide (LPS). It anchors the LPS into the outer membrane of most Gram-negative bacteria. Lipid A can be recognized by animal cells, triggers defense-related responses, and causes Gram-negative sepsis. The biosynthesis of Kdo<sub>2</sub>-lipid A, the LPS substructure, involves with nine enzymatic steps.</p> <p>Results</p> <p>In order to elucidate the evolutionary pathway of Kdo<sub>2</sub>-lipid A biosynthesis, we examined the distribution of genes encoding the nine enzymes across bacteria. We found that not all Gram-negative bacteria have all nine enzymes. Some Gram-negative bacteria have no genes encoding these enzymes and others have genes only for the first four enzymes (LpxA, LpxC, LpxD, and LpxB). Among the nine enzymes, five appeared to have arisen from three independent gene duplication events. Two of such events happened within the Proteobacteria lineage, followed by functional specialization of the duplicated genes and pathway optimization in these bacteria.</p> <p>Conclusions</p> <p>The nine-enzyme pathway, which was established based on the studies mainly in <it>Escherichia coli </it>K12, appears to be the most derived and optimized form. It is found only in <it>E. coli </it>and related Proteobacteria. Simpler and probably less efficient pathways are found in other bacterial groups, with Kdo<sub>2</sub>-lipid A variants as the likely end products. The Kdo<sub>2</sub>-lipid A biosynthetic pathway exemplifies extremely plastic evolution of bacterial genomes, especially those of Proteobacteria, and how these mainly pathogenic bacteria have adapted to their environment.</p

    Mining the Arabidopsis thaliana genome for highly-divergent seven transmembrane receptors

    Get PDF
    To identify divergent seven-transmembrane receptor (7TMR) candidates from the Arabidopsis thaliana genome, multiple protein classification methods were combined, including both alignment-based and alignment-free classifiers. This resolved problems in optimally training individual classifiers using limited and divergent samples, and increased stringency for candidate proteins. We identified 394 proteins as 7TMR candidates and highlighted 54 with corresponding expression patterns for further investigation

    Comparative genome analyses of four rice-infecting Rhizoctonia solani isolates reveal extensive enrichment of homogalacturonan modification genes

    Get PDF
    Background Plant pathogenic isolates of Rhizoctonia solani anastomosis group 1-intraspecific group IA (AG1-IA) infect a wide range of crops causing diseases such as rice sheath blight (ShB). ShB has become a serious disease in rice production worldwide. Additional genome sequences of the rice-infecting R. solani isolates from different geographical regions will facilitate the identification of important pathogenicity-related genes in the fungus. Results Rice-infecting R. solani isolates B2 (USA), ADB (India), WGL (India), and YN-7 (China) were selected for whole-genome sequencing. Single-Molecule Real-Time (SMRT) and Illumina sequencing were used for de novo sequencing of the B2 genome. The genomes of the other three isolates were then sequenced with Illumina technology and assembled using the B2 genome as a reference. The four genomes ranged from 38.9 to 45.0 Mbp in size, contained 9715 to 11,505 protein-coding genes, and shared 5812 conserved orthogroups. The proportion of transposable elements (TEs) and average length of TE sequences in the B2 genome was nearly 3 times and 2 times greater, respectively, than those of ADB, WGL and YN-7. Although 818 to 888 putative secreted proteins were identified in the four isolates, only 30% of them were predicted to be small secreted proteins, which is a smaller proportion than what is usually found in the genomes of cereal necrotrophic fungi. Despite a lack of putative secondary metabolite biosynthesis gene clusters, the rice-infecting R. solani genomes were predicted to contain the most carbohydrate-active enzyme (CAZyme) genes among all 27 fungal genomes used in the comparative analysis. Specifically, extensive enrichment of pectin/homogalacturonan modification genes were found in all four rice-infecting R. solani genomes. Conclusion Four R. solani genomes were sequenced, annotated, and compared to other fungal genomes to identify distinctive genomic features that may contribute to the pathogenicity of rice-infecting R. solani. Our analyses provided evidence that genomic conservation of R. solani genomes among neighboring AGs was more diversified than among AG1-IA isolates and the presence of numerous predicted pectin modification genes in the rice-infecting R. solani genomes that may contribute to the wide host range and virulence of this necrotrophic fungal pathogen.This research was supported by a Ph. D fellowship awarded to D.-Y. Lee by the Monsanto Beachell-Borlaug International Scholarship Program (MBBISP) as well as grants from the National Research Foundation of Korea to YHL (NRF-2020R1A2B5B03096402, NRF-2015M3A9B8028679, and NRF2018R1A5A1023599), the Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, and Forestry through Agricultural Microbiome Program to YHL (918017–04) and the USDA Hatch Project to GLW. KTK and JK is grateful for a graduate fellowship through the Brain Korea 21 Plus Program

    Inferring causal molecular networks: empirical assessment through a community-based effort

    Get PDF
    Inferring molecular networks is a central challenge in computational biology. However, it has remained unclear whether causal, rather than merely correlational, relationships can be effectively inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge that focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results constitute the most comprehensive assessment of causal network inference in a mammalian setting carried out to date and suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess the causal validity of inferred molecular networks

    Inferring causal molecular networks: empirical assessment through a community-based effort

    Get PDF
    It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense

    Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis

    Get PDF
    Correction: vol 7, 13205, 2016, doi:10.1038/ncomms13205Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in Bone-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h(2) = 0.18, P value = 0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data.Peer reviewe

    Protein family classification using multivariate methods

    No full text
    The number of protein sequences from agriculturally important crops is rapidly increasing in databases. In order to identify their functions efficiently and accurately, good computational methods are needed. Commonly used methods search databases using alignments. Some proteins may lack enough sequence similarities even though they share similar structures and biochemical functions. In such cases, alignment-based methods fail to identify proteins correctly. In order to classify these difficult proteins, alignment-free methods based on, e.g., multivariate methods are required. I examined application of two multivariate methods; principal component analysis (PCA) and partial least squares (PLS). Their performances were compared against profile hidden Markov models (HMMs) and PSI-BLAST. G-protein coupled receptors (GPCRs), cyclophilins, cytochrome b561 (Cyt b561), and immunoglobulin protein families were included in this study. Using physico-chemical properties as descriptors, I examined how the training dataset affects performance of the methods, how the methods can identify short fragmented sequences, and how the methods can identify proteins when only remotely similar samples are included in the training sets. The PLS methods outperformed profile HMM and PSI-BLAST when only a small number of positive samples (5 or 10) were included in the training dataset. PLS methods performed also better than profile HMM and PSI-BLAST in the identification of short fragmented sequences, and Cyt b561 expressed sequence tags from the Arabidopsis genome. Combining the results of PLS with other alignment-free methods, 342 proteins were identified as GPCR candidates, including 20 of the known 22 Arabidopsis GPCRs. Profile HMM identified only 15 of them. PLS method with descriptors selected by the t-test outperformed PLS method with descriptors from auto and cross-covariance in identifying cyclophilins from Arabidopsis and rice genomes. Finally, I developed a simple statistics method (ST-method) that is sensitive to protein with weak sequence similarities and generates low false positives. The ST-method outperformed PLS methods, profile HMMs, and PSI-BLAST in the classification of GPCRs and immunoglobulin superfamily. It identified 579, 717, and 382 GPCR candidates from Arabidopsis, rice, and maize genomes

    Evolution of the Kdo\u3csub\u3e2\u3c/sub\u3e-lipid A Biosynthesis in Bacteria

    Get PDF
    Background: Lipid A is the highly immunoreactive endotoxic center of lipopolysaccharide (LPS). It anchors the LPS into the outer membrane of most Gram-negative bacteria. Lipid A can be recognized by animal cells, triggers defense-related responses, and causes Gram-negative sepsis. The biosynthesis of Kdo2-lipid A, the LPS substructure, involves with nine enzymatic steps. Results: In order to elucidate the evolutionary pathway of Kdo2-lipid A biosynthesis, we examined the distribution of genes encoding the nine enzymes across bacteria. We found that not all Gram-negative bacteria have all nine enzymes. Some Gram-negative bacteria have no genes encoding these enzymes and others have genes only for the first four enzymes (LpxA, LpxC, LpxD, and LpxB). Among the nine enzymes, five appeared to have arisen from three independent gene duplication events. Two of such events happened within the Proteobacteria lineage, followed by functional specialization of the duplicated genes and pathway optimization in these bacteria. Conclusions: The nine-enzyme pathway, which was established based on the studies mainly in Escherichia coli K12, appears to be the most derived and optimized form. It is found only in E. coli and related Proteobacteria. Simpler and probably less efficient pathways are found in other bacterial groups, with Kdo2-lipid A variants as the likely end products. The Kdo2-lipid A biosynthetic pathway exemplifies extremely plastic evolution of bacterial genomes, especially those of Proteobacteria, and how these mainly pathogenic bacteria have adapted to their environment
    corecore