382 research outputs found

    Thermodynamics of protein folding: a random matrix formulation

    Full text link
    The process of protein folding from an unfolded state to a biologically active, folded conformation is governed by many parameters e.g the sequence of amino acids, intermolecular interactions, the solvent, temperature and chaperon molecules. Our study, based on random matrix modeling of the interactions, shows however that the evolution of the statistical measures e.g Gibbs free energy, heat capacity, entropy is single parametric. The information can explain the selection of specific folding pathways from an infinite number of possible ways as well as other folding characteristics observed in computer simulation studies.Comment: 21 Pages, no figure

    Limited Lifespan of Fragile Regions in Mammalian Evolution

    Full text link
    An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some doubts about their existence. We demonstrate that fragile regions are subject to a "birth and death" process, implying that fragility has limited evolutionary lifespan. This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome

    Safe and complete contig assembly via omnitigs

    Full text link
    Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph GG (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from GG as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

    Space-efficient merging of succinct de Bruijn graphs

    Full text link
    We propose a new algorithm for merging succinct representations of de Bruijn graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014, ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the art tool for the same problem presented by Muggli et al. [bioRxiv 2017, Bioinformatics 2019], but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds.Comment: Accepted to SPIRE'1

    String Matching and 1d Lattice Gases

    Full text link
    We calculate the probability distributions for the number of occurrences nn of a given ll letter word in a random string of kk letters. Analytical expressions for the distribution are known for the asymptotic regimes (i) krl1k \gg r^l \gg 1 (Gaussian) and k,lk,l \to \infty such that k/rlk/r^l is finite (Compound Poisson). However, it is known that these distributions do now work well in the intermediate regime krl1k \gtrsim r^l \gtrsim 1. We show that the problem of calculating the string matching probability can be cast into a determining the configurational partition function of a 1d lattice gas with interacting particles so that the matching probability becomes the grand-partition sum of the lattice gas, with the number of particles corresponding to the number of matches. We perform a virial expansion of the effective equation of state and obtain the probability distribution. Our result reproduces the behavior of the distribution in all regimes. We are also able to show analytically how the limiting distributions arise. Our analysis builds on the fact that the effective interactions between the particles consist of a relatively strong core of size ll, the word length, followed by a weak, exponentially decaying tail. We find that the asymptotic regimes correspond to the case where the tail of the interactions can be neglected, while in the intermediate regime they need to be kept in the analysis. Our results are readily generalized to the case where the random strings are generated by more complicated stochastic processes such as a non-uniform letter probability distribution or Markov chains. We show that in these cases the tails of the effective interactions can be made even more dominant rendering thus the asymptotic approximations less accurate in such a regime.Comment: 44 pages and 8 figures. Major revision of previous version. The lattice gas analogy has been worked out in full, including virial expansion and equation of state. This constitutes the main part of the paper now. Connections with existing work is made and references should be up to date now. To be submitted for publicatio

    Applying a User-centred Approach to Interactive Visualization Design

    Get PDF
    Analysing users in their context of work and finding out how and why they use different information resources is essential to provide interactive visualisation systems that match their goals and needs. Designers should actively involve the intended users throughout the whole process. This chapter presents a user-centered approach for the design of interactive visualisation systems. We describe three phases of the iterative visualisation design process: the early envisioning phase, the global specification hase, and the detailed specification phase. The whole design cycle is repeated until some criterion of success is reached. We discuss different techniques for the analysis of users, their tasks and domain. Subsequently, the design of prototypes and evaluation methods in visualisation practice are presented. Finally, we discuss the practical challenges in design and evaluation of collaborative visualisation environments. Our own case studies and those of others are used throughout the whole chapter to illustrate various approaches

    Efficacy and safety of left atrial appendage closure in patients with atrial fibrillation and high thromboembolic and bleeding risk

    Get PDF
    Aim. To compare the incidence of thromboembolic and hemorrhagic events after left atrial appendage occlusion (LAAO) or without prevention of thromboembolic events (TEEs) during prospective follow-up of patients with atrial fibrillation (AF) and a high risk of ischemic stroke (IS) who have contraindications to long-term anticoagulant therapy.Material and methods. The study included 134 patients with AF, a high risk of IS, and contraindications to long-term anticoagulation. Patients were divided into 2 groups as follows: the first group included patients who underwent LAAO (n=74), while the second one — those who did not undergo any TEE prevention (n=60). The follow-up period was 3 years. The cumulative rate of all-cause mortality, IS, transient ischemic attacks (TIA), and systemic embolism (SE) was taken as the primary efficacy endpoint. The primary safety endpoint included major bleeding according to GARFIELD registry criteria.Results.  The rate of composite efficacy endpoint in the LAAO group was significantly lower than in the group without thromboembolic prophylaxis (5,2 vs 17,4 per 100 patient-years; adjusted odds ratio (OR), 4,08; 95% confidence interval (CI): 1,7-9,5; p=0,001). The rate of major bleeding was comparable in both groups (2,4 in the LAAO group vs 1,3 per 100 patient-years in the group without thromboembolic prophylaxis; adjusted OR, 0,55; 95% CI: 0,1-3,09; p=0,509). In addition, the event rate of net clinical benefit (all-cause mortality + ischemic stroke/TIA/SE + major bleeding) in the LAAO group was also significantly lower (5,9 vs 18,2 per 100 patient-years; adjusted OR, 3,0; 95% CI: 1,47-6,36; p=0,003).Conclusion. Among patients with AF and contraindications to long-term anticoagulation after 3 years of follow-up, LAAO demonstrated the significant reduction of cumulative rate of all-cause mortality and non-fatal thromboembolic events. At the same time, the frequency of major bleeding was comparable between the groups, even taking into account access-site bleeding and postoperative antithrombotic therapy (ATT)-associated bleeding in the LAAO group. Further randomized clinical trials are required to confirm these data

    Viral population estimation using pyrosequencing

    Get PDF
    The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

    Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes

    Get PDF
    Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide an description of a sequence of duplication events as a context-free grammar (CFG). Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.

    ConDeTri - A Content Dependent Read Trimmer for Illumina Data

    Get PDF
    During the last few years, DNA and RNA sequencing have started to play an increasingly important role in biological and medical applications, especially due to the greater amount of sequencing data yielded from the new sequencing machines and the enormous decrease in sequencing costs. Particularly, Illumina/Solexa sequencing has had an increasing impact on gathering data from model and non-model organisms. However, accurate and easy to use tools for quality filtering have not yet been established. We present ConDeTri, a method for content dependent read trimming for next generation sequencing data using quality scores of each individual base. The main focus of the method is to remove sequencing errors from reads so that sequencing reads can be standardized. Another aspect of the method is to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequence data of arbitrary length and it is independent from sequencing coverage and user interaction. ConDeTri is able to trim and remove reads with low quality scores to save computational time and memory usage during de novo assemblies. Low coverage or large genome sequencing projects will especially gain from trimming reads. The method can easily be incorporated into preprocessing and analysis pipelines for Illumina data
    corecore