10,103 research outputs found

    Alkane hydroxylase genes in psychrophile genomes and the potential for cold active catalysis.

    Get PDF
    BackgroundPsychrophiles are presumed to play a large role in the catabolism of alkanes and other components of crude oil in natural low temperature environments. In this study we analyzed the functional diversity of genes for alkane hydroxylases, the enzymes responsible for converting alkanes to more labile alcohols, as found in the genomes of nineteen psychrophiles for which alkane degradation has not been reported. To identify possible mechanisms of low temperature optimization we compared putative alkane hydroxylases from these psychrophiles with homologues from nineteen taxonomically related mesophilic strains.ResultsSeven of the analyzed psychrophile genomes contained a total of 27 candidate alkane hydroxylase genes, only two of which are currently annotated as alkane hydroxylase. These candidates were mostly related to the AlkB and cytochrome p450 alkane hydroxylases, but several homologues of the LadA and AlmA enzymes, significant for their ability to degrade long-chain alkanes, were also detected. These putative alkane hydroxylases showed significant differences in primary structure from their mesophile homologues, with preferences for specific amino acids and increased flexibility on loops, bends, and Ī±-helices.ConclusionA focused analysis on psychrophile genomes led to discovery of numerous candidate alkane hydroxylase genes not currently annotated as alkane hydroxylase. Gene products show signs of optimization to low temperature, including regions of increased flexibility and amino acid preferences typical of psychrophilic proteins. These findings are consistent with observations of microbial degradation of crude oil in cold environments and identify proteins that can be targeted in rate studies and in the design of molecular tools for low temperature bioremediation

    Predicting protein function with hierarchical phylogenetic profiles: The Gene3D phylo-tuner method applied to eukaryotic Genomes

    Get PDF
    "Phylogenetic profiling'' is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity from 30% to 100% - and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune'' with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence - absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes

    Pairwise alignment incorporating dipeptide covariation

    Full text link
    Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation

    Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance

    Full text link
    We introduce a new measure of distance between languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between languages. Using such a measure, we perform language comparison for fifty natural languages and twelve genetic languages. Our natural language dataset is a collection of sentence-aligned parallel corpora from bible translations for fifty languages spanning a variety of language families. Although we use parallel corpora, which guarantees having the same content in all languages, interestingly in many cases languages within the same family cluster together. In addition to natural languages, we perform language comparison for the coding regions in the genomes of 12 different organisms (4 plants, 6 animals, and two human subjects). Our result confirms a significant high-level difference in the genetic language model of humans/animals versus plants. The proposed method is a step toward defining a quantitative measure of similarity between languages, with applications in languages classification, genre identification, dialect identification, and evaluation of translations
    • ā€¦
    corecore