47 research outputs found

    Algebraic correction methods for computational assessment of clone overlaps in DNA fingerprint mapping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Sulston score is a well-established, though approximate metric for probabilistically evaluating postulated clone overlaps in DNA fingerprint mapping. It is known to systematically over-predict match probabilities by various orders of magnitude, depending upon project-specific parameters. Although the exact probability distribution is also available for the comparison problem, it is rather difficult to compute and cannot be used directly in most cases. A methodology providing both improved accuracy and computational economy is required.</p> <p>Results</p> <p>We propose a straightforward algebraic correction procedure, which takes the Sulston score as a provisional value and applies a power-law equation to obtain an improved result. Numerical comparisons indicate dramatically increased accuracy over the range of parameters typical of traditional agarose fingerprint mapping. Issues with extrapolating the method into parameter ranges characteristic of newer capillary electrophoresis-based projects are also discussed.</p> <p>Conclusion</p> <p>Although only marginally more expensive to compute than the raw Sulston score, the correction provides a vastly improved probabilistic description of hypothesized clone overlaps. This will clearly be important in overlap assessment and perhaps for other tasks as well, for example in using the ranking of overlap probabilities to assist in clone ordering.</p

    LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs).</p> <p>Results</p> <p>To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize.</p> <p>Conclusions</p> <p>The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.</p

    A compartmentalized approach to the assembly of physical maps

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Physical maps have been historically one of the cornerstones of genome sequencing and map-based cloning strategies. They also support marker assisted breeding and EST mapping. The problem of building a high quality physical map is computationally challenging due to unavoidable noise in the input fingerprint data.</p> <p>Results</p> <p>We propose a novel compartmentalized method for the assembly of high quality physical maps from fingerprinted clones. The knowledge of genetic markers enables us to group clones into clusters so that clones in the same cluster are more likely to overlap. For each cluster of clones, a local physical map is first constructed using FingerPrinted Contigs (FPC). Then, all the individual maps are carefully merged into the final physical map. Experimental results on the genomes of rice and barley demonstrate that the compartmentalized assembly produces significantly more accurate maps, and that it can detect and isolate clones that would induce "chimeric" contigs if used in the final assembly.</p> <p>Conclusion</p> <p>The software is available for download at <url>http://www.cs.ucr.edu/~sbozdag/assembler/</url></p

    Structure-based approaches applied to the study of pharmaceutical relevant targets

    Get PDF
    Computer Aided Drug Design/Discovery methods became complementary to traditional and modern drug discovery approaches. Indeed CADD is useful to improve and speed up the detection and the optimization of bioactive molecules. The present study is focused on the application of structure-based approaches to the study of pharmaceutical relevant targets. The introduction provides a quick overview on the fundamentals of computational chemistry and structure-based methods, while in the successive chapters the main targets investigated through these methods are treated. In particular we focused our attention on Reverse Transcriptase of HIV-1, Monoamine oxidase B and VP35 of Ebola virus. The last chapter is dedicated to the validation of covalent docking performed with Autodock

    HLA class I supertype and supermotif definition by chemometric approaches.

    Get PDF
    Activation of cytotoxic T cells in human requires specific binding of antigenic peptides to human leukocyte antigen (HLA) molecules. HLA is the most polymorphic protein in the human body, currently 1814 different alleles collected in the HLA sequence database at the European Bioinformatics Institute. Most of the HLA molecules recognise different peptides. Also, some peptides can be recognised by several of HLA molecules. In the present project, all available class I HLA alleles are classified into supertypes. Super - binding motifs for peptides binding to some supertypes are defined where binding data are available. A variety of chemometric techniques are used in the project, including 2D and 3D QSAR techniques and different variable selection methods like SIMCA, GOLPE and genetic algorithm. Principal component analysis combined with molecular interaction fields calculation by the program GRID is used in the class I HLA classification. This thesis defines an HLA-A3 supermotif using two QSAR methods: the 3D-QSAR method CoMSIA, and a recently developed 2D-QSAR method, which is named the additive method. Four alleles with high phenotype frequency were included in the study: HLA-A*0301, HLA-A*1101, HLA-A*3101 and HLA- A*6801. An A*020T binding motif is also defined using amino acid descriptors and variable selection methods. Novel peptides have been designed according to the motifs and the binding affinity is tested experimentally. The results of the additive method are used in the online server, MHCPred, to predict binding affinity of unknown peptides. In HLA classification, the HLA-A, B and C molecules are classified into supertypes separately. A total of eight supertypes are observed for class I HLA, including A2, A3, A24, B7, B27, B44, CI and C4 supertype. Using the HLA classification, any newly discovered class I HLA molecule can be grouped into a supertype easily, thus simplifying the experimental function characterisation process

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Computer Science & Technology Series : XIX Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’13 was the nineteenth Congress in the CACIC series. It was organized by the Department of Computer Systems at the CAECE University in Mar del Plata. The Congress included 13 Workshops with 165 accepted papers, 5 Conferences, 3 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 5 courses. CACIC 2013 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 247 submissions. An average of 2.5 review reports were collected for each paper, for a grand total of 676 review reports that involved about 210 different reviewers. A total of 165 full papers, involving 489 authors and 80 Universities, were accepted and 25 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
    corecore