350 research outputs found

    Biologia i computació

    Get PDF

    In silico meets in vivo

    Get PDF
    A report of the 6th Georgia Tech-Oak Ridge National Lab International Conference on Bioinformatics 'In silico Biology: Gene Discovery and Systems Genomics', Atlanta, USA, 15-17 November, 2007

    EGASP: Introduction

    Get PDF

    Multiple non-collinear TF-map alignments of promoter regions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of the promoter sequence of genes with similar expression patterns is a basic tool to annotate common regulatory elements. Multiple sequence alignments are on the basis of most comparative approaches. The characterization of regulatory regions from co-expressed genes at the sequence level, however, does not yield satisfactory results in many occasions as promoter regions of genes sharing similar expression programs often do not show nucleotide sequence conservation.</p> <p>Results</p> <p>In a recent approach to circumvent this limitation, we proposed to align the maps of predicted transcription factors (referred as TF-maps) instead of the nucleotide sequence of two related promoters, taking into account the label of the corresponding factor and the position in the primary sequence. We have now extended the basic algorithm to permit multiple promoter comparisons using the progressive alignment paradigm. In addition, non-collinear conservation blocks might now be identified in the resulting alignments. We have optimized the parameters of the algorithm in a small, but well-characterized collection of human-mouse-chicken-zebrafish orthologous gene promoters.</p> <p>Conclusion</p> <p>Results in this dataset indicate that TF-map alignments are able to detect high-level regulatory conservation at the promoter and the 3'UTR gene regions, which cannot be detected by the typical sequence alignments. Three particular examples are introduced here to illustrate the power of the multiple TF-map alignments to characterize conserved regulatory elements in absence of sequence similarity. We consider this kind of approach can be extremely useful in the future to annotate potential transcription factor binding sites on sets of co-regulated genes from high-throughput expression experiments.</p

    Mutation patterns of amino acid tandem repeats in the human proteome

    Get PDF
    BACKGROUND: Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs). RESULTS: We identified 137 polymorphic variants in 115 different amino acid tandem repeats. Of these, 77 contained amino acid substitutions and 60 contained gaps (expansions or contractions of the repeat unit). The analysis showed that at least about 21% of the repeats might be polymorphic in humans. We compared the mutations found in different types of amino acid repeats and in adjacent regions. Overall, repeats showed a five-fold increase in the number of gap mutations compared to adjacent regions, reflecting the action of slippage within the repetitive structures. Gap and substitution mutations were very differently distributed between different amino acid repeat types. Among repeats containing gap variants we identified several disease and candidate disease genes. CONCLUSION: This is the first report at a genome-wide scale of the types of mutations occurring in the amino acid repeat component of the human proteome. We show that the mutational dynamics of different amino acid repeat types are very diverse. We provide a list of loci with highly variable repeat structures, some of which may be potentially involved in disease

    Topological quantum field theories: towards the cobordism hypothesis

    Get PDF
    Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2015, Director: Carles CasacubertaTopological quantum field theories (TQFTs) have been a past century attempt to axiomatize quantum field theories from physics. While the underlying theory for quantum mechanics had been fully developed in terms of Hilbert spaces and operator theory, the analytic basis of quantum field theories remained unsettled, and several approaches are still nowadays being considered. Surprisingly or not, the introduction of these new theories was received with high interest not only by physicists but also by the mathematical community. TQFTs became a recurrent field of study in mathematics mainly because of the interest they had from a topological standpoint. While TQFTs became less popular over the time in physics, the mathematical approach has increasingly attracted the attention of researchers because of its natural drift towards the homotopy theory of higher categories, showing some outstanding results such as the Cobordism Hypothesis, formulated by John Baez and James Dolan and recently proved by Jacob Lurie. The original definition of TQFTs was first given by Michael Atiyah's in 1988 as a generalization to category theory of group representations. A TQFT was defined as a functor from the category of cobordisms (smooth manifolds with boundary and additional structure), to the category of vector spaces (originally Atiyah formulated the definition in terms of ΛmodulesforaringΛ\Lambda -modules for a ring \Lambda). The definition, as Atiyah himself stated, was inspired in the previous work done by Edward Witten on super-symmetry and Graeme Segal on conformal theory. A successful understanding of TQFTs in low dimensions was rapidly achieved, and several theories in dimensions \leq 4 were developed. Baez and Dolan, foreseeing a near future, suggested in 1995 that a more complex theory was behind the classical formulation, and although their work lacked formality, it delimited essential guidelines of study. The mathematical importance of TQFTs has not passed unnoticed, and at least four Fields Medals have been given to this date to mathematicians for research related to TQFTs: Simon Donaldson, Vaughan Jones, Edward Witten and Maxim Kontsevich

    Comparison of splice sites in mammals and chicken.

    Full text link
    We have carried out an initial analysis of the dynamics of the recent evolution of the splice-sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the noncanonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites

    An assessment of gene prediction accuracy in large DNA sequences

    Full text link
    One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the ∼200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy ofGENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE,PROCRUSTES, andBLASTX, was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology
    corecore