189,183 research outputs found

    Unique features of Plasmids among different Citrobacter species

    Get PDF
    The _Citrobacter_ plasmids are supposed to represent the host genetic association within the living bacterial cell. The plasmids impart various beneficial characteristics to the host, helping it to retain suitable characteristics for adaptation as well as evolution. The study aims at understanding the role of prophage in influencing host functional characteristics by horizontal gene transfer or as whole plasmids. The _Citrobacter_ plasmid can be understood by analyzing many hypothetical protein sequences within its genome. Our study included 82 hypothetical proteins in 5 _Citrobacter_ plasmids genomes. The function predictions in 31 hypothetical proteins and 3-D structures were predicted for 11 protein sequences using PS2 server. The probable function prediction was done by using Bioinformatics web tools like CDD-BLAST, INTERPROSCAN, PFAM and COGs by searching sequence databases for the presence of orthologous enzymatic conserved domains in the hypothetical sequences. This study identified many uncharacterized proteins, whose roles are yet to be discovered in _Citrobacter_ plasmids. These results for unknown proteins within plasmids can be used in linking the genetic interactions of _Citrobacter_ species and their functions in different environmental conditions

    Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences

    Get PDF
    A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively contextindependent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time

    The ever-evolving concept of the gene: The use of RNA/Protein experimental techniques to understand genome functions

    Get PDF
    The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as "junk" DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years

    Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models

    Get PDF
    The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the “most wanted list” that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html

    MODBASE, a database of annotated comparative protein structure models and associated resources.

    Get PDF
    MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

    Nucleotide sequence and genomic organization of an ophiovirus associated with lettuce big-vein disease

    Get PDF
    The complete nucleotide sequence of an ophiovirus associated with lettuce big-vein disease has been elucidated. The genome consisted of four RNA molecules of approximately 7ò8, 1ò7, 1ò5 and 1ò4 kb. Virus particles were shown to contain nearly equimolar amounts of RNA molecules of both polarities. The 5'- and 3'-terminal ends of the RNA molecules are largely, but not perfectly, complementary to each other. The virus genome contains seven open reading frames. Database searches with the putative viral products revealed homologies with the RNA-dependent RNA polymerases of rhabdoviruses and Ranunculus white mottle virus, and the capsid protein of Citrus psorosis virus. The gene encoding the viral polymerase appears to be located on the RNA segment 1, while the nucleocapsid protein is encoded by the RNA3. No significant sequence similarities were observed with other viral proteins. In spite of the morphological resemblance with species in the genus Tenuivirus, the ophioviruses appear not to be evolutionary closely related to this genus nor any other viral genus
    corecore