88 research outputs found

    Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Enteropathogen Resource Integration Center (ERIC; <url>http://www.ericbrc.org</url>) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as <it>Escherichia coli </it>and <it>Salmonella </it>spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process.</p> <p>Description</p> <p>We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application.</p> <p>Conclusion</p> <p>Our Text Mining application is available online on the ERIC website <url>http://www.ericbrc.org/portal/eric/articles</url>. The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.</p

    Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli

    Get PDF
    The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downstream of the node, including itself. Using a large set of microarray expression data of the bacterium Escherichia coli, we find that the gene expression in different subnets exhibits a structured pattern in response to environmental changes and genotypic mutation. Subnets with less changes in their expression pattern have a higher fraction of feed-forward loop motifs and a lower fraction of small RNA targets within them. Our study implies that the TRN consists of several scales of regulatory organization: 1) subnets with more varying gene expression controlled by both transcription factors and post-transcriptional RNA regulation, and 2) subnets with less varying gene expression having more feed-forward loops and less post-transcriptional RNA regulation.Comment: 14 pages, 8 figures, to be published in PLoS Computational Biolog

    FLORA: a novel method to predict protein function from structure in diverse superfamilies

    Get PDF
    Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

    The 3-Hydroxy-2-Butanone Pathway Is Required for Pectobacterium carotovorum Pathogenesis

    Get PDF
    Pectobacterium species are necrotrophic bacterial pathogens that cause soft rot diseases in potatoes and several other crops worldwide. Gene expression data identified Pectobacterium carotovorum subsp. carotovorum budB, which encodes the α-acetolactate synthase enzyme in the 2,3-butanediol pathway, as more highly expressed in potato tubers than potato stems. This pathway is of interest because volatiles produced by the 2,3-butanediol pathway have been shown to act as plant growth promoting molecules, insect attractants, and, in other bacterial species, affect virulence and fitness. Disruption of the 2,3-butanediol pathway reduced virulence of P. c. subsp. carotovorum WPP14 on potato tubers and impaired alkalinization of growth medium and potato tubers under anaerobic conditions. Alkalinization of the milieu via this pathway may aid in plant cell maceration since Pectobacterium pectate lyases are most active at alkaline pH

    An Assessment of the Role of DNA Adenine Methyltransferase on Gene Expression Regulation in E coli

    Get PDF
    N6-Adenine methylation is an important epigenetic signal, which regulates various processes, such as DNA replication and repair and transcription. In γ-proteobacteria, Dam is a stand-alone enzyme that methylates GATC sites, which are non-randomly distributed in the genome. Some of these overlap with transcription factor binding sites. This work describes a global computational analysis of a published Dam knockout microarray alongside other publicly available data to throw insights into the extent to which Dam regulates transcription by interfering with protein binding. The results indicate that DNA methylation by DAM may not globally affect gene transcription by physically blocking access of transcription factors to binding sites. Down-regulation of Dam during stationary phase correlates with the activity of TFs whose binding sites are enriched for GATC sites

    Finite Intersection Property and Dynamical Compactness

    Get PDF
    [EN] Dynamical compactness with respect to a family as a new concept of chaoticity of a dynamical system was introduced and discussed in Huang et al. (J Differ Equ 260(9):6800-6827, 2016). In this paper we continue to investigate this notion. In particular, we prove that all dynamical systems are dynamically compact with respect to a Furstenberg family if and only if this family has the finite intersection property. We investigate weak mixing and weak disjointness by using the concept of dynamical compactness. We also explore further difference between transitive compactness and weak mixing. As a byproduct, we show that the -limit and the -limit sets of a point may have quite different topological structure. Moreover, the equivalence between multi-sensitivity, sensitive compactness and transitive sensitivity is established for a minimal system. Finally, these notions are also explored in the context of linear dynamics.Wen Huang and Sergii Kolyada acknowledge the hospitality of the School of Mathematical Sciences of the Fudan University, Shanghai. Sergii Kolyada also acknowledges the hospitality of the Max-Planck-Institute fur Mathematik (MPIM) in Bonn, the Departament de Matematica Aplicada of the Universitat Politecnica de Valencia, the partial support of Project MTM2013-47093-P, and the Department of Mathematics of the Chinese University of Hong Kong. We thank the referees for careful reading and constructive comments that have resulted in substantial improvements to this paper. Wen Huang was supported by NNSF of China (11225105, 11431012); Alfred Peris was supported by MINECO, Projects MTM2013-47093-P and MTM2016-75963-P, and by GVA, Project PROMETEOII/2013/013; and Guohua Zhang was supported by NNSF of China (11671094).Huang, W.; Khilko, D.; Kolyada, S.; Peris Manguillot, A.; Zhang, G. (2018). Finite Intersection Property and Dynamical Compactness. Journal of Dynamics and Differential Equations. 30(3):1221-1245. https://doi.org/10.1007/s10884-017-9600-8S12211245303Akin, E.: Recurrence in topological dynamics. The University Series in Mathematics, Plenum Press, New York, Furstenberg families and Ellis actions (1997)Akin, E., Auslander, J., Berg, K.: When is a transitive map chaotic Convergence in ergodic theory and probability (Columbus, OH, 1993), Ohio State Univ. Math. Res. Inst. Publ., vol. 5, pp. 25–40, de Gruyter, Berlin (1996)Akin, E., Glasner, E.: Residual properties and almost equicontinuity. J. Anal. Math. 84, 243–286 (2001)Akin, E., Kolyada, S.: Li–Yorke sensitivity. Nonlinearity 16(4), 1421–1433 (2003)Auslander, J.: Minimal flows and their extensions. North-Holland Mathematics Studies, vol. 153. North-Holland Publishing Co., Amsterdam, Notas de Matemática [Mathematical Notes], 122 (1988)Auslander, J., Yorke, J.A.: Interval maps, factors of maps, and chaos. Tôhoku Math. J. (2) 32(2), 177–188 (1980)Bayart, F., Matheron, É.: Dynamics of Linear Operators, Cambridge Tracts in Mathematics, vol. 179. Cambridge University Press, Cambridge (2009)Bès, J., Peris, A.: Hereditarily hypercyclic operators. J. Funct. Anal. 167(1), 94–112 (1999)Blanchard, F., Huang, W.: Entropy sets, weakly mixing sets and entropy capacity. Discrete Contin. Dyn. Syst. 20(2), 275–311 (2008)de la Rosa, M., Read, C.: A hypercyclic operator whose direct sum TTT\oplus T T ⊕ T is not hypercyclic. J. Oper. Theory 61(2), 369–380 (2009)Dowker, Y.N., Friedlander, F.G.: On limit sets in dynamical systems. Proc. Lond. Math. Soc. (3) 4, 168–176 (1954)Downarowicz, T.: Survey of odometers and Toeplitz flows. Algebraic and topological dynamics. Contemp. Math., vol. 385, Amer. Math. Soc., Providence, RI, pp. 7–37 (2005)Edwards, R.E.: Functional analysis. Dover Publications Inc, New York. Theory and applications. Corrected reprint of the 1965 original (1995)Furstenberg, H.: Disjointness in ergodic theory, minimal sets, and a problem in Diophantine approximation. Math. Syst. Theory 1, 1–49 (1967)Furstenberg, H.: Recurrence in ergodic theory and combinatorial number theory. M. B. Porter Lectures. Princeton University Press, Princeton, NJ (1981)Furstenberg, H., Weiss, B.: Topological dynamics and combinatorial number theory. J. Anal. Math. 34(1978), 61–85 (1979)Glasner, E., Weiss, B.: Sensitive dependence on initial conditions. Nonlinearity 6(6), 1067–1075 (1993)Grosse-Erdmann, K.-G., Peris, A.: Weakly mixing operators on topological vector spaces. Rev. R. Acad. Cienc. Exactas Fís. Nat. Ser. A Math. RACSAM, vol. 104, no. 2, pp. 413–426 (2010)Grosse-Erdmann, K.-G., Peris-Manguillot, A.: Linear chaos, Universitext. Springer, London (2011)Guckenheimer, J.: Sensitive dependence to initial conditions for one-dimensional maps. Commun. Math. Phys. 70(2), 133–160 (1979)Halpern, J.D.: Bases in vector spaces and the axiom of choice. Proc. Am. Math. Soc. 17, 670–673 (1966)He, W.H., Zhou, Z.L.: A topologically mixing system whose measure center is a singleton. Acta Math. Sin. (Chin. Ser.) 45(5), 929–934 (2002)Huang, W., Khilko, D., Kolyada, S., Zhang, G.: Dynamical compactness and sensitivity. J. Differ. Equ. 260(9), 6800–6827 (2016)Huang, W., Kolyada, S., Zhang, G.: Analogues of Auslander–Yorke theorems for multi-sensitivity. Ergod. Theory Dyn. Syst. 22, 1–15 (2016). doi: 10.1017/etds.2016.48Huang, W., Ye, X.: Devaney’s chaos or 2-scattering implies Li–Yorke’s chaos. Topol. Appl. 117(3), 259–272 (2002)Kelley, J.L.: General topology. Graduate Texts in Mathematics, vol. 27. Springer, New York. Reprint of the 1955 edition [Van Nostrand, Toronto, ON] (1975)Kolyada, S., Snoha, L., Trofimchuk, S.: Noninvertible minimal maps. Fund. Math. 168(2), 141–163 (2001)Li, J.: Transitive points via Furstenberg family. Topol. Appl. 158(16), 2221–2231 (2011)Li, J., Ye, X.D.: Recent development of chaos theory in topological dynamics. Acta Math. Sin. (Engl. Ser.) 32(1), 83–114 (2016)Liu, H., Liao, L., Wang, L.: Thickly syndetical sensitivity of topological dynamical system. Discrete Dyn. Nat. Soc. (2014). Art. ID 583431, 4Moothathu, T.K.S.: Stronger forms of sensitivity for dynamical systems. Nonlinearity 20(9), 2115–2126 (2007)Mycielski, J.: Independent sets in topological algebras. Fund. Math. 55, 139–147 (1964)Oprocha, P., Zhang, G.: On local aspects of topological weak mixing in dimension one and beyond. Stud. Math. 202(3), 261–288 (2011)Oprocha, P., Zhang, G.: On local aspects of topological weak mixing, sequence entropy and chaos. Ergod. Theory Dyn. Syst. 34(5), 1615–1639 (2014)Petersen, K.E.: Disjointness and weak mixing of minimal sets. Proc. Am. Math. Soc. 24, 278–280 (1970)Read, C.J.: The invariant subspace problem for a class of Banach spaces. II. Hypercyclic operators. Isr. J. Math. 63(1), 1–40 (1988)Ruelle, D.: Dynamical systems with turbulent behavior. In: Mathematical problems in theoretical physics (Proc. Internat. Conf., Univ. Rome, Rome, 1977), Lecture Notes in Phys., vol. 80, pp. 341–360. Springer, Berlin (1978)Šarkovskiĭ, A.N.: Continuous mapping on the limit points of an iteration sequence. Ukrain. Mat. Ž. 18(5), 127–130 (1966)Weiss, B.: A survey of generic dynamics. Descriptive set theory and dynamical systems (Marseille-Luminy, 1996), London Math. Soc. Lecture Note Ser., vol. 277, pp. 273–291. Cambridge Univ. Press, Cambridge (2000

    Gene fusions and gene duplications: relevance to genomic annotation and functional analysis

    Get PDF
    BACKGROUND: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. RESULTS: Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused) proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. CONCLUSION: The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes it possible to generate protein groups related by both sequence and function, avoiding mixing of unrelated sequences. Organisms differ in sizes of groups of sequence-related proteins. A sample comparison of orthologs to selected E. coli paralogous groups correlates with known physiological and taxonomic relationships between the organisms

    A Measure of the Promiscuity of Proteins and Characteristics of Residues in the Vicinity of the Catalytic Site That Regulate Promiscuity

    Get PDF
    Promiscuity, the basis for the evolution of new functions through ‘tinkering’ of residues in the vicinity of the catalytic site, is yet to be quantitatively defined. We present a computational method Promiscuity Indices Estimator (PROMISE) - based on signatures derived from the spatial and electrostatic properties of the catalytic residues, to estimate the promiscuity (PromIndex) of proteins with known active site residues and 3D structure. PromIndex reflects the number of different active site signatures that have congruent matches in close proximity of its native catalytic site, the quality of the matches and difference in the enzymatic activity. Promiscuity in proteins is observed to follow a lognormal distribution (μ = 0.28, σ = 1.1 reduced chi-square = 3.0E-5). The PROMISE predicted promiscuous functions in any protein can serve as the starting point for directed evolution experiments. PROMISE ranks carboxypeptidase A and ribonuclease A amongst the more promiscuous proteins. We have also investigated the properties of the residues in the vicinity of the catalytic site that regulates its promiscuity. Linear regression establishes a weak correlation (R2∼0.1) between certain properties of the residues (charge, polar, etc) in the neighborhood of the catalytic residues and PromIndex. A stronger relationship states that most proteins with high promiscuity have high percentages of charged and polar residues within a radius of 3 Å of the catalytic site, which is validated using one-tailed hypothesis tests (P-values∼0.05). Since it is known that these characteristics are key factors in catalysis, their relationship with the promiscuity index cross validates the methodology of PROMISE

    Origin and Examination of a Leafhopper Facultative Endosymbiont

    Get PDF
    Eukaryotes engage in intimate interactions with microbes that range in age and type of association. Although many conspicuous examples of ancient insect associates are studied (e.g., Buchneraaphidicola), fewer examples of younger associations are known. Here, we further characterize a recently evolved bacterial endosymbiont of the leafhopper Euscelidius variegatus (Hemiptera, Cicadellidae), called BEV. We found that BEV, continuously maintained in E. variegatus hosts at UC Berkeley since 1984, is vertically transmitted with high fidelity. Unlike many vertically transmitted, ancient endosymbioses, the BEV–E. variegatus association is not obligate for either partner, and BEV can be cultivated axenically. Sufficient BEV colonies were grown and harvested to estimate its genome size and provide a partial survey of the genome sequence. The BEV chromosome is about 3.8 Mbp, and there is evidence for an extrachromosomal element roughly 53 kb in size (e.g., prophage or plasmid). We sequenced 438 kb of unique short-insert clones, representing about 12% of the BEV genome. Nearly half of the gene fragments were similar to mobile DNA, including 15 distinct types of insertion sequences (IS). Analyses revealed that BEV not only shares virulence genes with plant pathogens, but also is closely related to the plant pathogenic genera Dickeya, Pectobacterium, and Brenneria. However, the slightly reduced genome size, abundance of mobile DNA, fastidious growth in culture, and efficient vertical transmission suggest that symbiosis with E. variegatus has had a significant impact on genome evolution in BEV

    Figure Text Extraction in Biomedical Literature

    Get PDF
    Background: Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engin
    corecore