6 research outputs found

    Towards Alignment Independent Quantitative Assessment of Homology Detection

    Get PDF
    Identification of homologous proteins provides a basis for protein annotation. Sequence alignment tools reliably identify homologs sharing high sequence similarity. However, identification of homologs that share low sequence similarity remains a challenge. Lowering the cutoff value could enable the identification of diverged homologs, but also introduces numerous false hits. Methods are being continuously developed to minimize this problem. Estimation of the fraction of homologs in a set of protein alignments can help in the assessment and development of such methods, and provides the users with intuitive quantitative assessment of protein alignment results. Herein, we present a computational approach that estimates the amount of homologs in a set of protein pairs. The method requires a prevalent and detectable protein feature that is conserved between homologs. By analyzing the feature prevalence in a set of pairwise protein alignments, the method can estimate the number of homolog pairs in the set independently of the alignments' quality. Using the HomoloGene database as a standard of truth, we implemented this approach in a proteome-wide analysis. The results revealed that this approach, which is independent of the alignments themselves, works well for estimating the number of homologous proteins in a wide range of homology values. In summary, the presented method can accompany homology searches and method development, provides validation to search results, and allows tuning of tools and methods

    Signal peptide is a conserved feature of homologous proteins.

    No full text
    <p>Homologous proteins of human/mouse, human/fruit fly, and human/C. elegans were extracted from HomoloGene triplets. The proportion of matching protein pairs, in which both proteins have or lack a signal peptide, was calculated for each pair of organisms (gray columns). For comparison, the expected proportions (black columns), is calculated under the assumption that signal peptide is not conserved between homologs, using the prevalence of proteins having signal peptides in each organism.</p

    Robustness of the <i>Fhom Estimator.</i>

    No full text
    <p> <i>Fhom</i> was estimated to be 94% for HomoloGene human/mouse pairs. Next, random protein pairs were added according to the original signal peptide prevalence of these organisms. A linear regression curve and the coefficient of determination (R<sup>2</sup>) confirm that the <i>Fhom Estimator</i> has a wide dynamic range, and is robust to variation in signal-to-noise ratio.</p

    Alignment independence of the <i>Fhom Estimator.</i>

    No full text
    <p>HomoloGene homologous protein pairs were divided into five groups according to the identity level of their alignment. The organisms used are human, mouse, fruit fly and C. elegans. The figure reveals that the fraction of homologs estimation (Fhom) is applicable for both closely-related and distantly-related protein pairs.</p

    Genome-Wide Analysis of C/D and H/ACA-Like Small Nucleolar RNAs in Leishmania major Indicates Conservation among Trypanosomatids in the Repertoire and in Their rRNA Targets

    No full text
    Small nucleolar RNAs (snoRNAs) are a large group of noncoding RNAs that exist in eukaryotes and archaea and guide modifications such as 2β€²-O-ribose methylations and pseudouridylation on rRNAs and snRNAs. Recently, we described a genome-wide screening approach with Trypanosoma brucei that revealed over 90 guide RNAs. In this study, we extended this approach to analyze the repertoire of the closely related human pathogen Leishmania major. We describe 23 clusters that encode 62 C/Ds that can potentially guide 79 methylations and 37 H/ACA-like RNAs that can potentially guide 30 pseudouridylation reactions. Like T. brucei, Leishmania also contains many modifications and guide RNAs relative to its genome size. This study describes 10 H/ACAs and 14 C/Ds that were not found in T. brucei. Mapping of 2β€²-O-methylations in rRNA regions rich in modifications suggests the existence of trypanosomatid-specific modifications conserved in T. brucei and Leishmania. Structural features of C/D snoRNAs, such as copy number, conservation of boxes, K turns, and intragenic and extragenic base pairing, were examined to elucidate the great variation in snoRNA abundance. This study highlights the power of comparative genomics for determining conserved features of noncoding RNAs
    corecore