923 research outputs found

    New algorithms and methods for protein and DNA sequence comparison

    Get PDF

    Phylogenetic analysis of the SAP30 family of transcriptional regulators reveals functional divergence in the domain that binds the nuclear matrix

    Get PDF
    Background: Deacetylation of histones plays a fundamental role in gene silencing, and this is mediated by a corepressor complex containing Sin3 as an essential scaffold protein. In this report we examine the evolution of two proteins in this complex, the Sin3-associated proteins SAP30L and SAP30, by using an archive of protein sequences from 62 species.Results: Our analysis indicates that in tetrapods SAP30L is more similar than SAP30 to the ancestral protein, and the two copies in this group originated by gene duplication which occurred after the divergence of Actinopterygii and Sarcopterygii about 450 million years ago (Mya). The phylogenetic analysis and biochemical experiments suggest that SAP30 has diverged functionally from the ancestral SAP30L by accumulating mutations that have caused attenuation of one of the original functions, association with the nuclear matrix. This function is mediated by a nuclear matrix association sequence, which consists of a conserved motif in the C-terminus and the adjacent nucleolar localization signal (NoLS).Conclusion: These results add further insight into the evolution and function of proteins of the SAP30 family, which share many characteristic with nuclear scaffolding proteins that are intimately involved in regulation of gene expression. Furthermore, SAP30L seems essential to eukaryotic biology, as it is found in animals, plants, fungi, as well as some taxa of unicellular eukaryotes

    ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    Get PDF
    Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems.</p

    MAP2: multiple alignment of syntenic genomic sequences

    Get PDF
    We describe a multiple alignment program named MAP2 based on a generalized pairwise global alignment algorithm for handling long, different intergenic and intragenic regions in genomic sequences. The MAP2 program produces an ordered list of local multiple alignments of similar regions among sequences, where different regions between local alignments are indicated by reporting only similar regions. We propose two similarity measures for the evaluation of the performance of MAP2 and existing multiple alignment programs. Experimental results produced by MAP2 on four real sets of orthologous genomic sequences show that MAP2 rarely missed a block of transitively similar regions and that MAP2 never produced a block of regions that are not transitively similar. Experimental results by MAP2 on six simulated data sets show that MAP2 found the boundaries between similar and different regions precisely. This feature is useful for finding conserved functional elements in genomic sequences. The MAP2 program is freely available in source code form at for academic use

    MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.</p> <p>Results</p> <p>We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.</p> <p>Conclusions</p> <p>MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p

    Maize Streak Virus: diversity and virulence

    Get PDF
    Zea mays was first introduced to Africa in Ghana by Portuguese traders in the 16th century. The steady spread of maize cultivation since then has made it the most important cereal crop in Africa today. Whereas improved maize genotypes and agricultural techniques enable yearly yields above 10 tons hectare-1 in the developed world, yearly yields across Africa have remained low at about 1 ton hectare-1 in most countries. Although outmoded agricultural practices are the main reason for poor yields, maize pathogens inflict substantial additional losses. Of the many pathogens currently confronting maize farmers in Africa, Maize streak virus (MSV) is the most significant

    ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach

    Get PDF
    Background: Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments. Results: We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align. Conclusions: The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments

    Genetic sequences: tracing the mutations of a disease.

    Get PDF
    The entry is to address the mini challenge 3 in the IEEE VAST Challenge 2010. A patient, identified by Interpol as Nicolai Kuryakin, is admitted to a hospital in Paris after being removed from a flight to Moscow due to illness. The patient, now deceased, was admitted with an unidentified illness and later developed symptoms consistent with Drafa Fever. An autopsy confirmed the presence of the Drafa virus in the patient’s bloodstream. Visual analytics methods are applied to help to understand the evolution of the current Drafa virus outbreak, as it may shed some light on Nicolai’s contacts
    • …
    corecore