2,232 research outputs found

    Distances and classification of amino acids for different protein secondary structures

    Full text link
    Window profiles of amino acids in protein sequences are taken as a description of the amino acid environment. The relative entropy or Kullback-Leibler distance derived from profiles is used as a measure of dissimilarity for comparison of amino acids and secondary structure conformations. Distance matrices of amino acid pairs at different conformations are obtained, which display a non-negligible dependence of amino acid similarity on conformations. Based on the conformation specific distances clustering analysis for amino acids is conducted.Comment: 15 pages, 8 figure

    High-throughput discovery of rare human nucleotide polymorphisms by Ecotilling

    Get PDF
    Human individuals differ from one another at only ∼0.1% of nucleotide positions, but these single nucleotide differences account for most heritable phenotypic variation. Large-scale efforts to discover and genotype human variation have been limited to common polymorphisms. However, these efforts overlook rare nucleotide changes that may contribute to phenotypic diversity and genetic disorders, including cancer. Thus, there is an increasing need for high-throughput methods to robustly detect rare nucleotide differences. Toward this end, we have adapted the mismatch discovery method known as Ecotilling for the discovery of human single nucleotide polymorphisms. To increase throughput and reduce costs, we developed a universal primer strategy and implemented algorithms for automated band detection. Ecotilling was validated by screening 90 human DNA samples for nucleotide changes in 5 gene targets and by comparing results to public resequencing data. To increase throughput for discovery of rare alleles, we pooled samples 8-fold and found Ecotilling to be efficient relative to resequencing, with a false negative rate of 5% and a false discovery rate of 4%. We identified 28 new rare alleles, including some that are predicted to damage protein function. The detection of rare damaging mutations has implications for models of human disease

    A methodology for determining amino-acid substitution matrices from set covers

    Full text link
    We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

    Simplified amino acid alphabets based on deviation of conditional probability from random background

    Get PDF
    The primitive data for deducing the Miyazawa-Jernigan contact energy or BLOSUM score matrix consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such conditional probability from random background, a scheme for reduction of amino acid alphabet is proposed. It is observed that evident discrepancy exists between reduced alphabets obtained from raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.Comment: 9 pages,3figure

    Discovery of chemically induced mutations in rice by TILLING

    Get PDF
    BACKGROUND: Rice is both a food source for a majority of the world's population and an important model system. Available functional genomics resources include targeted insertion mutagenesis and transgenic tools. While these can be powerful, a non-transgenic, unbiased targeted mutagenesis method that can generate a range of allele types would add considerably to the analysis of the rice genome. TILLING (Targeting Induced Local Lesions in Genomes), a general reverse genetic technique that combines traditional mutagenesis with high throughput methods for mutation discovery, is such a method. RESULTS: To apply TILLING to rice, we developed two mutagenized rice populations. One population was developed by treatment with the chemical mutagen ethyl methanesulphonate (EMS), and the other with a combination of sodium azide plus methyl-nitrosourea (Az-MNU). To find induced mutations, target regions of 0.7–1.5 kilobases were PCR amplified using gene specific primers labeled with fluorescent dyes. Heteroduplexes were formed through denaturation and annealing of PCR products, mismatches digested with a crude preparation of CEL I nuclease and cleaved fragments visualized using denaturing polyacrylamide gel electrophoresis. In 10 target genes screened, we identified 27 nucleotide changes in the EMS-treated population and 30 in the Az-MNU population. CONCLUSION: We estimate that the density of induced mutations is two- to threefold higher than previously reported rice populations (about 1/300 kb). By comparison to other plants used in public TILLING services, we conclude that the populations described here would be suitable for use in a large scale TILLING project

    VENN, a tool for titrating sequence conservation onto protein structures

    Get PDF
    Residue conservation is an important, established method for inferring protein function, modularity and specificity. It is important to recognize that it is the 3D spatial orientation of residues that drives sequence conservation. Considering this, we have built a new computational tool, VENN that allows researchers to interactively and graphically titrate sequence homology onto surface representations of protein structures. Our proposed titration strategies reveal critical details that are not readily identified using other existing tools. Analyses of a bZIP transcription factor and receptor recognition of Fibroblast Growth Factor using VENN revealed key specificity determinants. Weblink: http://sbtools.uchc.edu/venn/

    Optimal neighborhood indexing for protein similarity search

    Get PDF
    Background: Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet.\ud \ud Results: The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum.\ud \ud Conclusions: We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction

    Towards Reliable Automatic Protein Structure Alignment

    Full text link
    A variety of methods have been proposed for structure similarity calculation, which are called structure alignment or superposition. One major shortcoming in current structure alignment algorithms is in their inherent design, which is based on local structure similarity. In this work, we propose a method to incorporate global information in obtaining optimal alignments and superpositions. Our method, when applied to optimizing the TM-score and the GDT score, produces significantly better results than current state-of-the-art protein structure alignment tools. Specifically, if the highest TM-score found by TMalign is lower than (0.6) and the highest TM-score found by one of the tested methods is higher than (0.5), there is a probability of (42%) that TMalign failed to find TM-scores higher than (0.5), while the same probability is reduced to (2%) if our method is used. This could significantly improve the accuracy of fold detection if the cutoff TM-score of (0.5) is used. In addition, existing structure alignment algorithms focus on structure similarity alone and simply ignore other important similarities, such as sequence similarity. Our approach has the capacity to incorporate multiple similarities into the scoring function. Results show that sequence similarity aids in finding high quality protein structure alignments that are more consistent with eye-examined alignments in HOMSTRAD. Even when structure similarity itself fails to find alignments with any consistency with eye-examined alignments, our method remains capable of finding alignments highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment

    Get PDF
    The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor

    Candida albicans repetitive elements display epigenetic diversity and plasticity

    Get PDF
    Transcriptionally silent heterochromatin is associated with repetitive DNA. It is poorly understood whether and how heterochromatin differs between different organisms and whether its structure can be remodelled in response to environmental signals. Here, we address this question by analysing the chromatin state associated with DNA repeats in the human fungal pathogen Candida albicans. Our analyses indicate that, contrary to model systems, each type of repetitive element is assembled into a distinct chromatin state. Classical Sir2-dependent hypoacetylated and hypomethylated chromatin is associated with the rDNA locus while telomeric regions are assembled into a weak heterochromatin that is only mildly hypoacetylated and hypomethylated. Major Repeat Sequences, a class of tandem repeats, are assembled into an intermediate chromatin state bearing features of both euchromatin and heterochromatin. Marker gene silencing assays and genome-wide RNA sequencing reveals that C. albicans heterochromatin represses expression of repeat-associated coding and non-coding RNAs. We find that telomeric heterochromatin is dynamic and remodelled upon an environmental change. Weak heterochromatin is associated with telomeres at 30?°C, while robust heterochromatin is assembled over these regions at 39?°C, a temperature mimicking moderate fever in the host. Thus in C. albicans, differential chromatin states controls gene expression and epigenetic plasticity is linked to adaptation
    • …
    corecore