95 research outputs found

    IgTM: An algorithm to predict transmembrane domains and topology in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Due to their role of receptors or transporters, membrane proteins play a key role in many important biological functions. In our work we used Grammatical Inference (GI) to localize transmembrane segments. Our GI process is based specifically on the inference of Even Linear Languages.</p> <p>Results</p> <p>We obtained values close to 80% in both specificity and sensitivity. Six datasets have been used for the experiments, considering different encodings for the input sequences. An encoding that includes the topology changes in the sequence (from inside and outside the membrane to it and vice versa) allowed us to obtain the best results. This software is publicly available at: <url>http://www.dsic.upv.es/users/tlcc/bio/bio.html</url></p> <p>Conclusion</p> <p>We compared our results with other well-known methods, that obtain a slightly better precision. However, this work shows that it is possible to apply Grammatical Inference techniques in an effective way to bioinformatics problems.</p

    Connectable Components for Protein Design

    Get PDF
    Protein design requires reusable, trustworthy, and connectable parts in order to scale to complex challenges. The recent explosion of protein structures stored within the Protein Data Bank provides a wealth of small motifs we can harvest, but we still lack tools to combine them into larger proteins. Here I explore two approaches for connecting reusable protein components on two different length scales. On the atomic scale, I build an interactive search engine for connecting chemical fragments together. Protein fragments built using this search engine recapitulate native-like protein assemblies that can be integrated into existing protein scaffolds using backbone search engines such as MaDCaT. On the protein domain scale, I quantitatively dissect structural variations in two-component systems in order to extract general principles for engineering interfacial flexibility between modular four-helix bundles. These bundles exhibit large scissoring motions where helices move towards or away from the bundle axis and these motions propagate across domain boundaries. Together, these two approaches form the beginnings of a multiscale methodology for connecting reusable protein fragments where there is a constant interplay and feedback between design of atomic structure, secondary structure, and tertiary structure. Rapid iteration, visualization, and search glue these diverse length scales together into a cohesive whole

    Probabilistic grammatical model of protein language and its application to helix-helix contact site classification

    Get PDF
    BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists

    Origin, evolution and stability of overlapping genes in viruses: A systematic review

    Get PDF
    During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps

    Proteomics of Toxoplasma gondii

    Get PDF
    The Apicomplexan parasite Toxoplasma gondii is an obligate intracellular parasite. Infection by T .gondii causes the disease toxoplasmosis, which is one of the most prevalent parasitic diseases of animals and humans. It has been 100 years since the first discovery of the parasite in 1908; research on T. gondii has been carried out in many scientific disciplines consistently expanding the understanding of this parasite. In the last ten years, the developments of EST, microarray, genome sequencing and continuing efforts towards genome annotation has centralized the focus of T. gondii research on the understanding of gene expression and gene functions on the genome scale. Equipped with the technical advances in mass spectrometry and bioinformatics, proteomics has become established as an integral component in the post-genomics era by providing first-hand data on the functional products of gene expression. In this study, three complementary proteomic strategies, 1-DE, 2-DE and MudPIT, have been used to characterise the proteome of T. gondii tachyzoites. Protein identifications have been acquired for more than two thousand (2252) unique release 4 genes, representing almost one third (29%) of the predicted proteome of all life cycle stages. Functional predictions for each protein were carried out, which provided valuable insights into the composition of the expressed proteome and their potential biological roles. The T. gondii proteomic data has been integrated into the publically accessible ToxoDB, where 2477 intron-spanning peptides provided supporting evidence for correct splice site annotation of the release 4 genome annotation. The incompleteness of the release 4 genome annotation has been highlighted using peptide evidence, confirming 421 splice sites that are only predicted by alternative gene models. Analysis has also been carried out on the proteomic data in the light of other genome wide expression data. The comparison of the proteome and transcriptome of Toxoplasma and other Apicomplexa parasites has revealed important discrepancies between protein and mRNA expression where interesting candidates have been highlighted for further investigation. A preliminary DIGE study has been developed to characterize protein expression changes in T. gondii grown in the presence or absence of glucose. In conclusion, this study has demonstrated the importance of proteomic applications in understanding gene expression profiles and regulation in T. gondii and highlighted the importance and potential of proteogenomic approaches in genome annotation process. The importance of temporal and quantitative proteomics as well as the future of systems biology has been discussed

    Deciphering type IV pilus biology in the Gram-positive opportunistic pathogen Streptococcus sanguinis

    Get PDF
    Type IV pili (Tfp) are the paradigm of a large group of diverse and functionally versatile nanomachines, intensively studied in Gram-negative bacteria. However, details regarding the molecular mechanisms of Tfp biogenesis and/or mediated functions are still unclear. Thus, owing to the inherent lack of outer cell wall in Gram-positive bacteria, my PhD has focused on molecular characterisation of Tfp in a simpler such bacterium Streptococcus sanguinis. My work has shown that the naturally competent S. sanguinis produces bona fide retractable Tfp enabling twitching motility, but dispensable for competence. Unlike Gram-negative Tfp, we show that S. sanguinis Tfp are unusual since they are composed of two pilin proteins, a feature likely to be shared by other Gram-positive Tfp-expressing species. All the genes involved in Tfp biology in S. sanguinis are found within a pil locus encoding 21 proteins. A systematic genetic study highlighted that 10 proteins only are required for Tfp biogenesis, whilst another four modulate twitching motility. To enhance genetic manipulation of S. sanguinis, a markerless mutagenesis strategy was devised enabling us to make various mutations in situ, which helped us characterise some of these proteins further. Via this methodology, the last six genes of the pil locus were found to be completely dispensable for Tfp biology. To get an overall structural picture of Tfp in S. sanguinis, the structure of one of the major pilins (PilE1) was determined by NMR. Moreover, three pilin-like proteins within the pil locus were found to be minor Tfp components. Collectively, my work has established S. sanguinis as a robust Gram-positive model organism for studying Tfp, which paves way for interesting future studies.Open Acces

    Bioprospecting of Trichococcus species

    Get PDF
    Since 1928 with the discovery of penicillin, the value of microbes in our society significantly was reconsidered. Nowadays, 60% of commercial drugs and products mimic or derive from microbialmetabolites. After almost a century, can we find new compounds and where? For addressing thisquestion, we need a large-scale screening of the microbial capabilities. Trichococcus species have multiple genes for producing 1,3-propanediol (1,3-PDO), which synthesizes the partially biodegradable plastic PTT. Based on this, we developed a strategy for analyzing 90,000 bacterial genomes that eventually generated information for every microbial characteristic. The outstanding factor is that all this information is stored in a database that can be easily mined for everything. This collective andunbiased strategy resulted in identifying the key genes for efficient production of 1,3-PDO. We discovered 187 novel candidates that can produce 1,3-PDO and some were in the lab confirmed. Another result of the screening was about Trichococcus patagoniensis. This bacterium grows in minus 5 degrees without oxygen and was discovered by NASA scientists to simulate life in other planets. When it is cold and without oxygen, T. patagoniensis “extra-terrestrial” properties allow it to create its own ”blanket” by producing exopolymer saccharides. We characterized this cryoprotectant compound as inulin, which prevents crystallization of water and many plants use it for preserving their roots in subzero temperatures. Furthermore, inulin is a commercial prebiotic and is connected with gut health. Considering the bacterial kingdom, there are limited members producing inulin and none of them wereidentified as prychrotolerant species. T. patagoniensisis produces plenty of inulin and due to its robustness, easily can be the next biofactory for the compound.The applied methods in this PhD thesis is a platform for mining every bacterial or metabolic information. All the knowledge is there and we need to dive into it. Every finding will be revolutionary and expand our perspective for microbes. Big data mining is like Zenos Dichotomy paradox, we will always know half and never everything.</p

    Serotonergic transcriptional regulatory logic in Caenorhabditis elegans

    Get PDF
    Lógica de regulación transcripcional de las neuronas serotonérgicas en Caenorhabditis elegans La diversidad del sistema nervioso se genera mediante la activación de múltiples baterías únicas de genes efectores, que definen las propiedades funcionales de los diferentes subtipos neuronales. Está bien establecido que los factores de transcripción (FT) se unen de una manera combinatoria y cooperativa a secuencias de ADN presentes en los elementos de regulación en cis del genoma, llamados potenciadores (enhancers en inglés). Esto otorga a los FT un papel central en la regulación de la expresión génica. Sin embargo, no se conocen los mecanismos por los que estas combinaciones de FT identifican y activan sus secuencias diana. En este trabajo se han utilizado las neuronas serotonérgicas como paradigma de investigación de las leyes que regulan la selección del transcriptoma de un tipo neuronal concreto durante la diferenciación terminal. Las neuronas serotonérgicas se encuentran presentes en todos los grupos de eumetazoos y se definen por su habilidad de sintetizar y liberar serotonina (5-HT), lo cual es posible gracias a la expresión de los llamados ‘genes de la vía de la 5-HT’. Aprovechando esta conservación filogenética, hemos utilizado el organismo modelo Caenorhabditis elegans para diseccionar la lógica de regulación transcripcional de las neuronas serotonérgicas. Los hermafroditas C. elegans contienen tres subclases de neuronas serotonérgicas con diferente función: la neurona motora HSN, la neurona secretora ADF y la neurona motora neurosecretora NSM. Mediante un análisis de regulación in vivo de los genes de la vía de la 5-HT, hemos identificado módulos de regulación en cis (MRC) independientes responsables de su expresión en cada uno de los tres subtipos serotonérgicos. Esta organización modular sugiere que cada subclase utiliza una lógica de regulación diferente. Para profundizar en los mecanismos de selección y activación del transcriptoma específico de un tipo neuronal decidimos enfocar el resto de nuestro trabajo en el estudio de la neurona HSN, por ser la mejor caracterizada hasta la fecha. El análisis de mutantes de pérdida de función, junto con el estudio detallado de los MRC de la neurona HSN, revelan que un código de seis FT es capaz de activar directamente el transcriptoma de la neurona HSN. Este código, al que hemos llamado ‘Colectivo de FT de HSN’, está formado por AST-1 (de la familia de FT ETS), UNC-86 (POU), SEM-4 (SPALT), HLH-3 (bHLH), EGL-46 (INSM) y EGL-18 (GATA). Esta combinación, es suficiente, en algunos contextos celulares para la inducción del fenotipo serotonérgico y necesario durante toda la vida del animal para mantener la identidad de la neurona HSN. Por otro lado, estudios bioinformáticos de predicción de sitios de unión para los seis FT del código, muestran que los genes expresados en la neurona HSN están enriquecidos en la presencia de agrupaciones de estos seis sitios de unión, en comparación a un conjunto de genes elegidos al azar. Mediante el análisis de reporteros in vivo, demostramos que esta agrupación constituye una huella reguladora que es suficiente para la identificación de nuevos potenciadores funcionales para la neurona HSN. Además, esta huella reguladora contiene normas sintácticas que mejoran la predicción de potenciadores expresados en la célula. Curiosamente, el programa de diferenciación de las neuronas serotonérgicas en ratón está controlado por FT que son ortólogos a los del nematodo. Esta elevada homología en la regulación nos ha permitido identificar nuevos candidatos a regular las neuronas serotonérgicas del gusano (PHA-4, ortólogo a FOXA2) y del ratón (SALL2, ortólogo a SEM-4). Asimismo, los ortólogos de ratón son capaces de sustituir funcionalmente a los FT equivalentes en gusano. Finalmente, el Análisis de Coordenadas Principales sugiere que, de entre todas las neuronas del gusano, el transcriptoma de la neurona HSN es el que más se asemeja a aquel de las neuronas serotonérgicas de ratón, revelando relaciones de homología profunda. En conclusión, hemos demostrado que la presencia de una huella reguladora basada en un conjunto definido de FT es suficiente para identificar potenciadores, utilizando únicamente la secuencia primaria de ADN. Además, hemos identificado las reglas que gobiernan el código de regulación transcripcional de un tipo neuronal relevante en dos especies separadas hace más de 700 millones de años.Serotonergic transcriptional regulatory logic in Caenorhabditis elegans Neuronal diversity in the nervous system is generated through the activation of multiple unique batteries of terminal differentiation genes, which determine the functional properties of the distinct mature neurons. It is generally accepted that transcription factors (TFs) bind in a combinatorial and cooperative manner to DNA sequences of the genome called enhancers, placing TFs as the main regulators of gene expression. However, how these combinations of TFs identify and activate their target sequences is poorly understood. In this work we use as a paradigm the serotonergic neurons to unravel the regulatory rules that select a cell type-specific transcriptome during terminal differentiation. Serotonergic neurons are present in all eumetazoan groups and are universally defined by their ability to synthesise and release serotonin (5-HT), which is achieved by the expression of the ‘5-HT pathway genes’. Taking advantage of this phylogenetic conservation, we use the simple model organism Caenorhabditis elegans to dissect the transcriptional regulatory logic of serotonergic neurons. C. elegans hermaphrodites have three functionally different serotonergic subclasses: the HSN motorneuron, the ADF sensory neuron and the NSM neurosecretory motorneuron. All three neuron subtypes express the 5-HT pathway genes. Through an in vivo cis-regulatory analysis of these genes we have identified independent cis-regulatory modules (CRM) responsible for their expression in each serotonergic neuron subtype. This modular organisation suggests that different regulatory logics are employed in each neuron subclass to activate its terminal transcriptome. To deepen in our understanding of how cell type-specific transcriptional programmes are implemented we decided to focus the rest of our work on the best characterised serotonergic neuron subtype, the HSN neuron, and carried out an extensive dissection of HSN terminal differentiation transcriptional rules. Loss of function mutant and cis-regulatory analyses reveal that direct activation of the HSN transcriptome is orchestrated by a code of six TFs, that we have termed HSN TF collective. This TF code is composed by AST-1 (ETS TF family), UNC-86 (POU TF family), SEM-4 (SPALT TF family), HLH-3 (bHLH TF family), EGL-46 (INSM TF family) and EGL-18 (GATA TF family). The expression of the HSN TF collective is sufficient to induce serotonergic fate in some specific contexts and is required throughout the life of the animal in order to maintain the identity of the HSN neuron. Bioinformatically identified binding site clusters for the six TFs of the HSN TF collective are enriched in known HSN expressed genes compared to a random set of genes. Through in vivo reporter analysis, we demonstrate that this clustering constitutes a regulatory signature that is sufficient for de novo identification of HSN neuron functional enhancers. This regulatory signature contains certain syntactic constrains that further improve the prediction of enhancer expression in the cell. Mouse orthologues of most members of the HSN TF collective are known regulators of the mammalian serotonergic differentiation programme. This homology in both serotonergic regulatory programmes allows for the identification of an additional candidate TF in the worm (PHA-4), orthologue to the mouse FOXA2, and a mouse TF (SALL2), orthologue of the worm SEM-4. Moreover, we prove that mouse orthologues can functionally substitute for their worm counterparts. Finally, Principal Coordinates Analysis suggests that, among C. elegans neurons, the HSN transcriptome most closely resembles that of mouse serotonergic neurons, which reveals deep homology. Our results show that a regulatory signature based on a defined set of TFs is sufficient for enhancer identification using primary DNA sequence. Moreover, our results identify rules governing the transcriptional regulatory code of a critically important neuronal type in two species separated by over 700 million years

    Hearing Loss

    Get PDF
    Authored by 17 international researchers and research teams, the book provides up-to-date insights on topics in five different research areas related to normal hearing and deafness. Techniques for assessment of hearing and the appropriateness of the Mongolian gerbil as a model for age-dependent hearing loss in humans are presented. Parental attitudes to childhood deafness and role of early intervention for better treatment of hearing loss are also discussed. Comprehensive details are provided on the role of different environmental insults including injuries in causing deafness. Additionally, many genes involved in hearing loss are reviewed and the genetics of recessively inherited moderate to severe and progressive deafness is covered for the first time. The book also details established and evolving therapies for treatment of deafness
    corecore