195 research outputs found

    ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    Get PDF
    Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems.</p

    Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives

    Get PDF
    BACKGROUND: Predicting residues' contacts using primary amino acid sequence alone is an important task that can guide 3D structure modeling and can verify the quality of the predicted 3D structures. The correlated mutations (CM) method serves as the most promising approach and it has been used to predict amino acids pairs that are distant in the primary sequence but form contacts in the native 3D structure of homologous proteins. RESULTS: Here we report a new implementation of the CM method with an added set of selection rules (filters). The parameters of the algorithm were optimized against fifteen high resolution crystal structures with optimization criterion that maximized the confidentiality of the predictions. The optimization resulted in a true positive ratio (TPR) of 0.08 for the CM without filters and a TPR of 0.14 for the CM with filters. The protocol was further benchmarked against 65 high resolution structures that were not included in the optimization test. The benchmarking resulted in a TPR of 0.07 for the CM without filters and to a TPR of 0.09 for the CM with filters. CONCLUSION: Thus, the inclusion of selection rules resulted to an overall improvement of 30%. In addition, the pair-wise comparison of TPR for each protein without and with filters resulted in an average improvement of 1.7. The methodology was implemented into a web server that is freely available to the public. The purpose of this implementation is to provide the 3D structure predictors with a tool that can help with ranking alternative models by satisfying the largest number of predicted contacts, as well as it can provide a confidence score for contacts in cases where structure is known

    Inter-Homolog Crossing-Over and Synapsis in Arabidopsis Meiosis Are Dependent on the Chromosome Axis Protein AtASY3

    Get PDF
    In this study we have analysed AtASY3, a coiled-coil domain protein that is required for normal meiosis in Arabidopsis. Analysis of an Atasy3-1 mutant reveals that loss of the protein compromises chromosome axis formation and results in reduced numbers of meiotic crossovers (COs). Although the frequency of DNA double-strand breaks (DSBs) appears moderately reduced in Atasy3-1, the main recombination defect is a reduction in the formation of COs. Immunolocalization studies in wild-type meiocytes indicate that the HORMA protein AtASY1, which is related to Hop1 in budding yeast, forms hyper-abundant domains along the chromosomes that are spatially associated with DSBs and early recombination pathway proteins. Loss of AtASY3 disrupts the axial organization of AtASY1. Furthermore we show that the AtASY3 and AtASY1 homologs BoASY3 and BoASY1, from the closely related species Brassica oleracea, are co-immunoprecipitated from meiocyte extracts and that AtASY3 interacts with AtASY1 via residues in its predicted coiled-coil domain. Together our results suggest that AtASY3 is a functional homolog of Red1. Since studies in budding yeast indicate that Red1 and Hop1 play a key role in establishing a bias to favor inter-homolog recombination (IHR), we propose that AtASY3 and AtASY1 may have a similar role in Arabidopsis. Loss of AtASY3 also disrupts synaptonemal complex (SC) formation. In Atasy3-1 the transverse filament protein AtZYP1 forms small patches rather than a continuous SC. The few AtMLH1 foci that remain in Atasy3-1 are found in association with the AtZYP1 patches. This is sufficient to prevent the ectopic recombination observed in the absence of AtZYP1, thus emphasizing that in addition to its structural role the protein is important for CO formation

    CLUSS: Clustering of protein sequences based on a new similarity measure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "<it>phylogenetic</it>" in the sense of "<it>relatedness of biological functions</it>".</p> <p>Results</p> <p>To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity.</p> <p>Conclusion</p> <p>We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.</p

    Evolution of Reproductive Morphology in Leaf Endophytes

    Get PDF
    The endophytic lifestyle has played an important role in the evolution of the morphology of reproductive structures (body) in one of the most problematic groups in fungal classification, the Leotiomycetes (Ascomycota). Mapping fungal morphologies to two groups in the Leiotiomycetes, the Rhytismatales and Hemiphacidiaceae reveals significant divergence in body size, shape and complexity. Mapping ecological roles to these taxa reveals that the groups include endophytic fungi living on leaves and saprobic fungi living on duff or dead wood. Finally, mapping of the morphologies to ecological roles reveals that leaf endophytes produce small, highly reduced fruiting bodies covered with fungal tissue or dead host tissue, while saprobic species produce large and intricate fruiting bodies. Intriguingly, resemblance between asexual conidiomata and sexual ascomata in some leotiomycetes implicates some common developmental pathways for sexual and asexual development in these fungi

    MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.</p> <p>Results</p> <p>We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.</p> <p>Conclusions</p> <p>MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p

    Supervised multivariate analysis of sequence groups to identify specificity determining residues

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteins that evolve from a common ancestor can change functionality over time, and it is important to be able identify residues that cause this change. In this paper we show how a supervised multivariate statistical method, Between Group Analysis (BGA), can be used to identify these residues from families of proteins with different substrate specifities using multiple sequence alignments.</p> <p>Results</p> <p>We demonstrate the usefulness of this method on three different test cases. Two of these test cases, the Lactate/Malate dehydrogenase family and Nucleotidyl Cyclases, consist of two functional groups. The other family, Serine Proteases consists of three groups. BGA was used to analyse and visualise these three families using two different encoding schemes for the amino acids.</p> <p>Conclusion</p> <p>This overall combination of methods in this paper is powerful and flexible while being computationally very fast and simple. BGA is especially useful because it can be used to analyse any number of functional classes. In the examples we used in this paper, we have only used 2 or 3 classes for demonstration purposes but any number can be used and visualised.</p

    XplorSeq: A software environment for integrated management and phylogenetic analysis of metagenomic sequence data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects.</p> <p>Results</p> <p>XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file.</p> <p>Conclusion</p> <p>XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at <url>http://vent.colorado.edu/phyloware</url>.</p

    Identifying and Characterizing a Novel Protein Kinase STK35L1 and Deciphering Its Orthologs and Close-Homologs in Vertebrates

    Get PDF
    The human kinome containing 478 eukaryotic protein kinases has over 100 uncharacterized kinases with unknown substrates and biological functions. The Ser/Thr kinase 35 (STK35, Clik1) is a member of the NKF 4 (New Kinase Family 4) in the kinome with unknown substrates and biological functions. Various high throughput studies indicate that STK35 could be involved in various human diseases such as colorectal cancer and malaria. In this study, we found that the previously published coding sequence of the STK35 gene is incomplete. The newly identified sequence of the STK35 gene codes for a protein of 534 amino acids with a N-terminal elongation of 133 amino acids. It has been designated as STK35L (STK35 long). Since it is the first of further homologous kinases we termed it as STK35L1. The STK35L1 protein (58 kDa on SDS-PAGE), but not STK35 (44 kDa), was found to be expressed in all human cells studied (endothelial cells, HeLa, and HEK cells) and was down-regulated after silencing with specific siRNA. EGFP-STK35L1 was localized in the nucleus and the nucleolus. By combining syntenic and gene structure pattern data and homology searches, two further STK35L1 homologs, STK35L2 (previously known as PDIK1L) and STK35L3, were found. All these protein kinase homologs were conserved throughout the vertebrates. The STK35L3 gene was specifically lost during placental mammalian evolution. Using comparative genomics, we have identified orthologous sets of these three protein kinases genes and their possible ancestor gene in two sea squirt genomes. We found the full-length coding sequence of the STK35 gene and termed it as STK35L1. We identified a new third STK35-like gene, STK35L3, in vertebrates and a possible ancestor gene in sea squirt genome. This study will provide a comprehensive platform to explore the role of STK35L kinases in cell functions and human diseases
    • …
    corecore