34 research outputs found

    Hidden breakpoints in genome alignments

    Full text link
    During the course of evolution, an organism's genome can undergo changes that affect the large-scale structure of the genome. These changes include gene gain, loss, duplication, chromosome fusion, fission, and rearrangement. When gene gain and loss occurs in addition to other types of rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these "hidden" breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons. We present an extension of the multichromosomal breakpoint median problem to genomes that have undergone gene gain and loss. We then demonstrate that the median distance among three genomes can be used to calculate a lower bound on the number of hidden breakpoints present. We provide an implementation of this calculation including the median distance, along with some practical improvements on the time complexity of the underlying algorithm. We apply our approach to measure the abundance of hidden breakpoints in simulated data sets under a wide range of evolutionary scenarios. We demonstrate that in simulations the hidden breakpoint counts depend strongly on relative rates of inversion and gene gain/loss. Finally we apply current multiple genome aligners to the simulated genomes, and show that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation. Our results suggest that hidden breakpoint error may be pervasive in genome alignments.Comment: 13 pages, 4 figure

    Cardiovascular risk estimation and eligibility for statins in primary prevention comparing different strategies.

    Get PDF
    Recommendations for statin use for primary prevention of coronary heart disease (CHD) are based on estimation of the 10-year CHD risk. It is unclear which risk algorithm and guidelines should be used in European populations. Using data from a population-based study in Switzerland, we first assessed 10-year CHD risk and eligibility for statins in 5,683 women and men 35 to 75 years of age without cardiovascular disease by comparing recommendations by the European Society of Cardiology without and with extrapolation of risk to age 60 years, the International Atherosclerosis Society, and the US Adult Treatment Panel III. The proportions of participants classified as high-risk for CHD were 12.5% (15.4% with extrapolation), 3.0%, and 5.8%, respectively. Proportions of participants eligible for statins were 9.2% (11.6% with extrapolation), 13.7%, and 16.7%, respectively. Assuming full compliance to each guideline, expected relative decreases in CHD deaths in Switzerland over a 10-year period would be 16.4% (17.5% with extrapolation), 18.7%, and 19.3%, respectively; the corresponding numbers needed to treat to prevent 1 CHD death would be 285 (340 with extrapolation), 380, and 440, respectively. In conclusion, the proportion of subjects classified as high risk for CHD varied over a fivefold range across recommendations. Following the International Atherosclerosis Society and the Adult Treatment Panel III recommendations might prevent more CHD deaths at the cost of higher numbers needed to treat compared with European Society of Cardiology guidelines

    Machine Learning Approaches for Metagenomics

    No full text

    Best-case results for nearest-neighbor learning

    No full text

    A Multi GPU Read Alignment Algorithm with Model-Based Performance Optimization

    No full text

    Efficient Sequential Clamping for Lifted Message Passing

    No full text
    Abstract. Lifted message passing approaches can be extremely fast at computing approximate marginal probability distributions over single variables and neighboring ones in the underlying graphical model. They do, however, not prescribe a way to solve more complex inference tasks such as computing joint marginals for k-tuples of distant random variables or satisfying assignments of CNFs. A popular solution in these cases is the idea of turning the complex inference task into a sequence of simpler ones by selecting and clamping variables one at a time and running lifted message passing again after each selection. This naive solution, however, recomputes the lifted network in each step from scratch, therefore often canceling the benefits of lifted inference. We show how to avoid this by efficiently computing the lifted network for each conditioning directly from the one already known for the single node marginals. Our experiments show that significant efficiency gains are possible for lifted message passing guided decimation for SAT and sampling

    Model Checking (k, d)-Markov Chain with ipLTL

    No full text

    BAC Overlap Identification Based on Bit-Vectors

    No full text
    Abstract. There is no software that accurately calculates the overlap of two BACs fast enough for application to thousands of cases in turn. The problems include unacceptably low speed of dynamic programming algorithms for sequences of the considered size and failure of the faster local alignment methods to identify complete sequence overlaps. Lower sequence quality at both BAC ends and internal difference blocks, being small enough to not significantly increase relative error rates but large enough to terminate local alignments, cause output of multiple overlapping local matches which do not extend to both sequence ends. Based on Myers ’ bit-vector algorithm for fast edit distance calculation, we developed the program BACOLAP, that identifies overlapping BACs just as sensitive as global dynamic programming alignment and as fast as local heuristic alignment.

    Chaining algorithms for alignment of draft sequence

    No full text
    Abstract. In this paper we propose a chaining method that can align a draft genomic sequence against a finished genome. We introduce the use of an overlap tree to enhance the state information available to the chaining procedure in the context of sparse dynamic programming, and demonstrate that the resulting procedure more accurately penalizes the various biological rearrangements. The algorithm is tested on a whole genome alignment of seven yeast species. We also demonstrate a variation on the algorithm that can be used for coassembly of two genomes and show how it can improve the current assembly of the Ciona savignyi (sea squirt) genome.
    corecore