183 research outputs found

    Gene-oriented ortholog database: a functional comparison platform for orthologous loci

    Get PDF
    The accumulation of complete genomic sequences enhances the need for functional annotation. Associating existing functional annotation of orthologs can speed up the annotation process and even examine the existing annotation. However, current protein sequence-based ortholog databases provide ambiguous and incomplete orthology in eukaryotes. It is because that isoforms, derived by alternative splicing (AS), often share higher sequence similarity to interfere the sequence-based identification. Gene-Oriented Ortholog Database (GOOD) employs genomic locations of transcripts to cluster AS-derived isoforms prior to ortholog delineation to eliminate the interference from AS. From the gene-oriented presentation, isoforms can be clearly associated to their genes to provide comprehensive ortholog information and further be discriminated from paralogs. Aside from, displaying clusters of isoforms between orthologous genes can present the evolution variation at the transcription level. Based on orthology, GOOD additionally comprises functional annotation from the Gene Ontology (GO) database. However, there exist redundant annotations, both parent and child terms assigned to the same gene, in the GO database. It is difficult to precisely draw the numerical comparison of term counts between orthologous genes annotated with redundant terms. Instead of the description only, GOOD further provides the GO graphs to reveal hierarchical-like relationships among divergent functionalities. Therefore, the redundancy of GO terms can be examined, and the context among compared terms is more comprehensive. In sum, GOOD can improve the interpretation in the molecular function from experiments in the model organism and provide clear comparative genomic annotation across organisms

    The partially alternating ternary sum in an associative dialgebra

    Full text link
    The alternating ternary sum in an associative algebra, abcacbbac+bca+cabcbaabc - acb - bac + bca + cab - cba, gives rise to the partially alternating ternary sum in an associative dialgebra with products \dashv and \vdash by making the argument aa the center of each term: abcacbbac+cab+bcacbaa \dashv b \dashv c - a \dashv c \dashv b - b \vdash a \dashv c + c \vdash a \dashv b + b \vdash c \vdash a - c \vdash b \vdash a. We use computer algebra to determine the polynomial identities in degree 9\le 9 satisfied by this new trilinear operation. In degrees 3 and 5 we obtain [a,b,c]+[a,c,b]0[a,b,c] + [a,c,b] \equiv 0 and [a,[b,c,d],e]+[a,[c,b,d],e]0[a,[b,c,d],e] + [a,[c,b,d],e] \equiv 0; these identities define a new variety of partially alternating ternary algebras. We show that there is a 49-dimensional space of multilinear identities in degree 7, and we find equivalent nonlinear identities. We use the representation theory of the symmetric group to show that there are no new identities in degree 9.Comment: 14 page

    MINE: Module Identification in Networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks.</p> <p>Results</p> <p>MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the <it>C. elegans </it>protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties.</p> <p>Conclusions</p> <p>MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both <it>S. cerevisiae </it>and <it>C. elegans</it>.</p

    Development of a single tube 640-plex genotyping method for detection of nucleic acid variations on microarrays

    Get PDF
    Detection of DNA sequence variation is critical to biomedical applications, including disease genetic identification, diagnosis and treatment, drug discovery and forensic analysis. Here, we describe an arrayed primer extension-based genotyping method (APEX-2) that allows multiplex (640-plex) DNA amplification and detection of single nucleotide polymorphisms (SNPs) and mutations on microarrays via four-color single-base primer extension. The founding principle of APEX-2 multiplex PCR requires two oligonucleotides per SNP/mutation to generate amplicons containing the position of interest. The same oligonucleotides are then subsequently used as immobilized single-base extension primers on a microarray. The method described here is ideal for SNP or mutation detection analysis, molecular diagnostics and forensic analysis. This robust genetic test has minimal requirements: two primers, two spots on the microarray and a low cost four-color detection system for the targeted site; and provides an advantageous alternative to high-density platforms and low-density detection systems

    Rich-Club Phenomenon in the Interactome of P. falciparum—Artifact or Signature of a Parasitic Life Style?

    Get PDF
    Recent advances have provided a first experimental protein interaction map of the human malaria parasite P. falciparum, which appears to be remotely related to interactomes of other eukaryotes. Here, we present a comparative topological analysis of this experimentally determined web with a network of conserved interactions between proteins in S. cerevisiae, C. elegans and D. melanogaster that have an ortholog in Plasmodium. Focusing on experimental interactions, we find a significant presence of a “rich-club,” a topological characteristic that features an “oligarchy” of highly connected proteins being intertwined with one another. In complete contrast, the network of interologs and particularly the web of evolutionary-conserved interactions in P. falciparum lack this feature. This observation prompts the question of whether this result points to a topological signature of the parasite's biology, since experimentally obtained interactions widely cover parasite-specific functions. Significantly, hub proteins that appear in such an oligarchy revolve around invasion functions, shaping an island of parasite-specific activities in a sea of evolutionary inherited interactions. This presence of a biologically unprecedented network feature in the human malaria parasite might be an artifact of the quality and the methods to obtain interaction data in this organism. Yet, the observation that rich-club proteins have distinctive and statistically significant functions that revolve around parasite-specific activities point to a topological signature of a parasitic life style

    FastBLAST: Homology Relationships for Millions of Proteins

    Get PDF
    BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.Methodology/principal findingsWe present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.Conclusions/significanceFastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast

    DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.</p> <p>Results</p> <p>An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.</p> <p>Conclusions</p> <p>DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from <url>http://140.109.42.19:16080/dodo_web/home.htm</url></p

    Bacterial Lifestyle in a Deep-sea Hydrothermal Vent Chimney Revealed by the Genome Sequence of the Thermophilic Bacterium Deferribacter desulfuricans SSM1

    Get PDF
    The complete genome sequence of the thermophilic sulphur-reducing bacterium, Deferribacter desulfuricans SMM1, isolated from a hydrothermal vent chimney has been determined. The genome comprises a single circular chromosome of 2 234 389 bp and a megaplasmid of 308 544 bp. Many genes encoded in the genome are most similar to the genes of sulphur- or sulphate-reducing bacterial species within Deltaproteobacteria. The reconstructed central metabolisms showed a heterotrophic lifestyle primarily driven by C1 to C3 organics, e.g. formate, acetate, and pyruvate, and also suggested that the inability of autotrophy via a reductive tricarboxylic acid cycle may be due to the lack of ATP-dependent citrate lyase. In addition, the genome encodes numerous genes for chemoreceptors, chemotaxis-like systems, and signal transduction machineries. These signalling networks may be linked to this bacterium's versatile energy metabolisms and may provide ecophysiological advantages for D. desulfuricans SSM1 thriving in the physically and chemically fluctuating environments near hydrothermal vents. This is the first genome sequence from the phylum Deferribacteres
    corecore