382 research outputs found
Thermodynamics of protein folding: a random matrix formulation
The process of protein folding from an unfolded state to a biologically
active, folded conformation is governed by many parameters e.g the sequence of
amino acids, intermolecular interactions, the solvent, temperature and chaperon
molecules. Our study, based on random matrix modeling of the interactions,
shows however that the evolution of the statistical measures e.g Gibbs free
energy, heat capacity, entropy is single parametric. The information can
explain the selection of specific folding pathways from an infinite number of
possible ways as well as other folding characteristics observed in computer
simulation studies.Comment: 21 Pages, no figure
Limited Lifespan of Fragile Regions in Mammalian Evolution
An important question in genome evolution is whether there exist fragile
regions (rearrangement hotspots) where chromosomal rearrangements are happening
over and over again. Although nearly all recent studies supported the existence
of fragile regions in mammalian genomes, the most comprehensive phylogenomic
study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some
doubts about their existence. We demonstrate that fragile regions are subject
to a "birth and death" process, implying that fragility has limited
evolutionary lifespan. This finding implies that fragile regions migrate to
different locations in different mammals, explaining why there exist only a few
chromosomal breakpoints shared between different lineages. The birth and death
of fragile regions phenomenon reinforces the hypothesis that rearrangements are
promoted by matching segmental duplications and suggests putative locations of
the currently active fragile regions in the human genome
Safe and complete contig assembly via omnitigs
Contig assembly is the first stage that most assemblers solve when
reconstructing a genome from a set of reads. Its output consists of contigs --
a set of strings that are promised to appear in any genome that could have
generated the reads. From the introduction of contigs 20 years ago, assemblers
have tried to obtain longer and longer contigs, but the following question was
never solved: given a genome graph (e.g. a de Bruijn, or a string graph),
what are all the strings that can be safely reported from as contigs? In
this paper we finally answer this question, and also give a polynomial time
algorithm to find them. Our experiments show that these strings, which we call
omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of
dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201
Space-efficient merging of succinct de Bruijn graphs
We propose a new algorithm for merging succinct representations of de Bruijn
graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the
lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014,
ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the
art tool for the same problem presented by Muggli et al. [bioRxiv 2017,
Bioinformatics 2019], but it uses less than half of its working space. A novel
important feature of our algorithm, not found in any of the existing tools, is
that it can compute the Variable Order succinct representation of the union
graph within the same asymptotic time/space bounds.Comment: Accepted to SPIRE'1
String Matching and 1d Lattice Gases
We calculate the probability distributions for the number of occurrences
of a given letter word in a random string of letters. Analytical
expressions for the distribution are known for the asymptotic regimes (i) (Gaussian) and such that is finite
(Compound Poisson). However, it is known that these distributions do now work
well in the intermediate regime . We show that the
problem of calculating the string matching probability can be cast into a
determining the configurational partition function of a 1d lattice gas with
interacting particles so that the matching probability becomes the
grand-partition sum of the lattice gas, with the number of particles
corresponding to the number of matches. We perform a virial expansion of the
effective equation of state and obtain the probability distribution. Our result
reproduces the behavior of the distribution in all regimes. We are also able to
show analytically how the limiting distributions arise. Our analysis builds on
the fact that the effective interactions between the particles consist of a
relatively strong core of size , the word length, followed by a weak,
exponentially decaying tail. We find that the asymptotic regimes correspond to
the case where the tail of the interactions can be neglected, while in the
intermediate regime they need to be kept in the analysis. Our results are
readily generalized to the case where the random strings are generated by more
complicated stochastic processes such as a non-uniform letter probability
distribution or Markov chains. We show that in these cases the tails of the
effective interactions can be made even more dominant rendering thus the
asymptotic approximations less accurate in such a regime.Comment: 44 pages and 8 figures. Major revision of previous version. The
lattice gas analogy has been worked out in full, including virial expansion
and equation of state. This constitutes the main part of the paper now.
Connections with existing work is made and references should be up to date
now. To be submitted for publicatio
Applying a User-centred Approach to Interactive Visualization Design
Analysing users in their context of work and finding out how and why they use different information resources is essential to provide interactive visualisation systems that match their goals and needs. Designers should actively involve the intended users throughout the whole process. This chapter presents a user-centered approach for the design of interactive visualisation systems. We describe three phases of the iterative visualisation design process: the early envisioning phase, the global specification hase, and the detailed specification phase. The whole design cycle is repeated until some criterion of success is reached. We discuss different techniques for the analysis of users, their tasks and domain. Subsequently, the design of prototypes and evaluation methods in visualisation practice are presented. Finally, we discuss the practical challenges in design and evaluation of collaborative visualisation environments. Our own case studies and those of others are used throughout the whole chapter to illustrate various approaches
Efficacy and safety of left atrial appendage closure in patients with atrial fibrillation and high thromboembolic and bleeding risk
Aim. To compare the incidence of thromboembolic and hemorrhagic events after left atrial appendage occlusion (LAAO) or without prevention of thromboembolic events (TEEs) during prospective follow-up of patients with atrial fibrillation (AF) and a high risk of ischemic stroke (IS) who have contraindications to long-term anticoagulant therapy.Material and methods. The study included 134 patients with AF, a high risk of IS, and contraindications to long-term anticoagulation. Patients were divided into 2 groups as follows: the first group included patients who underwent LAAO (n=74), while the second one — those who did not undergo any TEE prevention (n=60). The follow-up period was 3 years. The cumulative rate of all-cause mortality, IS, transient ischemic attacks (TIA), and systemic embolism (SE) was taken as the primary efficacy endpoint. The primary safety endpoint included major bleeding according to GARFIELD registry criteria.Results. The rate of composite efficacy endpoint in the LAAO group was significantly lower than in the group without thromboembolic prophylaxis (5,2 vs 17,4 per 100 patient-years; adjusted odds ratio (OR), 4,08; 95% confidence interval (CI): 1,7-9,5; p=0,001). The rate of major bleeding was comparable in both groups (2,4 in the LAAO group vs 1,3 per 100 patient-years in the group without thromboembolic prophylaxis; adjusted OR, 0,55; 95% CI: 0,1-3,09; p=0,509). In addition, the event rate of net clinical benefit (all-cause mortality + ischemic stroke/TIA/SE + major bleeding) in the LAAO group was also significantly lower (5,9 vs 18,2 per 100 patient-years; adjusted OR, 3,0; 95% CI: 1,47-6,36; p=0,003).Conclusion. Among patients with AF and contraindications to long-term anticoagulation after 3 years of follow-up, LAAO demonstrated the significant reduction of cumulative rate of all-cause mortality and non-fatal thromboembolic events. At the same time, the frequency of major bleeding was comparable between the groups, even taking into account access-site bleeding and postoperative antithrombotic therapy (ATT)-associated bleeding in the LAAO group. Further randomized clinical trials are required to confirm these data
Viral population estimation using pyrosequencing
The diversity of virus populations within single infected hosts presents a
major difficulty for the natural immune response as well as for vaccine design
and antiviral drug therapy. Recently developed pyrophosphate based sequencing
technologies (pyrosequencing) can be used for quantifying this diversity by
ultra-deep sequencing of virus samples. We present computational methods for
the analysis of such sequence data and apply these techniques to pyrosequencing
data obtained from HIV populations within patients harboring drug resistant
virus strains. Our main result is the estimation of the population structure of
the sample from the pyrosequencing reads. This inference is based on a
statistical approach to error correction, followed by a combinatorial algorithm
for constructing a minimal set of haplotypes that explain the data. Using this
set of explaining haplotypes, we apply a statistical model to infer the
frequencies of the haplotypes in the population via an EM algorithm. We
demonstrate that pyrosequencing reads allow for effective population
reconstruction by extensive simulations and by comparison to 165 sequences
obtained directly from clonal sequencing of four independent, diverse HIV
populations. Thus, pyrosequencing can be used for cost-effective estimation of
the structure of virus populations, promising new insights into viral
evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure
Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes
Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide an description of a sequence of duplication events as a context-free grammar (CFG). Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.
ConDeTri - A Content Dependent Read Trimmer for Illumina Data
During the last few years, DNA and RNA sequencing have started to play an increasingly important role in biological and medical applications, especially due to the greater amount of sequencing data yielded from the new sequencing machines and the enormous decrease in sequencing costs. Particularly, Illumina/Solexa sequencing has had an increasing impact on gathering data from model and non-model organisms. However, accurate and easy to use tools for quality filtering have not yet been established. We present ConDeTri, a method for content dependent read trimming for next generation sequencing data using quality scores of each individual base. The main focus of the method is to remove sequencing errors from reads so that sequencing reads can be standardized. Another aspect of the method is to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequence data of arbitrary length and it is independent from sequencing coverage and user interaction. ConDeTri is able to trim and remove reads with low quality scores to save computational time and memory usage during de novo assemblies. Low coverage or large genome sequencing projects will especially gain from trimming reads. The method can easily be incorporated into preprocessing and analysis pipelines for Illumina data
- …