193 research outputs found

    SANA NetGO: A combinatorial approach to using Gene Ontology (GO) terms to score network alignments

    Full text link
    Gene Ontology (GO) terms are frequently used to score alignments between protein-protein interaction (PPI) networks. Methods exist to measure the GO similarity between two proteins in isolation, but pairs of proteins in a network alignment are not isolated: each pairing is implicitly dependent upon every other pairing via the alignment itself. Current methods fail to take into account the frequency of GO terms across the networks, and attempt to account for common GO terms in an ad hoc fashion by imposing arbitrary rules on when to "allow" GO terms based on their location in the GO hierarchy, rather than using readily available frequency information in the PPI networks themselves. Here we develop a new measure, NetGO, that naturally weighs infrequent, informative GO terms more heavily than frequent, less informative GO terms, without requiring arbitrary cutoffs. In particular, NetGO down-weights the score of frequent GO terms according to their frequency in the networks being aligned. This is a global measure applicable only to alignments, independent of pairwise GO measures, in the same sense that the edge-based EC or S3 scores are global measures of topological similarity independent of pairwise topological similarities. We demonstrate the superiority of NetGO by creating alignments of predetermined quality based on homologous pairs of nodes and show that NetGO correlates with alignment quality much better than any existing GO-based alignment measures. We also demonstrate that NetGO provides a measure of taxonomic similarity between species, consistent with existing taxonomic measures--a feature not shared with existing GO-based network alignment measures. Finally, we re-score alignments produced by almost a dozen aligners from a previous study and show that NetGO does a better job than existing measures at separating good alignments from bad ones

    Shadowing-based reliability decay in softened n-body simulations

    Get PDF
    A shadow of a numerical solution to a chaotic system is an_exact_ solution to the equations of motion that remains close to the numerical solution for a long time. In a collisionless n-body system, we know that particle motion is governed by the global potential rather than by inter-particle interactions. As a result, the trajectory of each individual particle in the system is independently shadowable. It is thus meaningful to measure the number of particles that have shadowable trajectories as a function of time. We find that the number of shadowable particles decays exponentially with time as exp(-mu t), and that for eps in [~0.2,1] (in units of the local mean inter-particle separation nˉ\bar n), there is an explicit relationship between the decay constant mu, the timestep h of the leapfrog integrator, the softening eps, and the number of particles N in the simulation. Thus, given N and eps, it is possible to pre-compute the timestep h necessary to acheive a desired fraction of shadowable particles after a given length of simulation time. We demonstrate that a large fraction of particles remain shadowable over ~100 crossing times even if particles travel up to about 1/3 of the softening length per timestep. However, a sharp decrease in the number of shadowable particles occurs if the timestep increases to allow particles to travel further than 1/3 the softening length in one timestep, or if the softening is decreased below ~0.2nˉ\bar n.Comment: 4 pages, 5 figure

    The interplay of chaos between the terrestrial and giant planets

    Get PDF
    We report on some simple experiments on the nature of chaos in our planetary system. We make the following interesting observations. First, we look at the system of Sun + four Jovian planets as an isolated five-body system interacting only via Newtonian gravity. We find that if we measure the Lyapunov time of this system across thousands of initial conditions all within observational uncertainty, then the value of the Lyapunov time seems relatively smooth across some regions of initial condition space, while in other regions it fluctuates wildly on scales as small as we can reliably measure using numerical methods. This probably indicates a fractal structure of Lyapunov exponents measured across initial condition space. Then, we add the four inner terrestrial planets and several post-Newtonian corrections such as general relativity into the model. In this more realistic Sun + eight-planet system, we find that the above structure of chaos for the outer planets becomes uniformly chaotic for almost all planets and almost all initial conditions, with a Lyapunov time-scale of about 5-20 Myr. This seems to indicate that the addition of the inner planets adds more chaos to the system. Finally, we show that if we instead remove the outer planets and look at the isolated five-body system of the Sun + four terrestrial planets, then the terrestrial planets alone show no evidence of chaos at all, over a large range of initial conditions inside the observational error volume. We thus conclude that the uniformity of chaos in the outer planets comes not from the inner planets themselves, but from the interplay between the outer and inner ones. Interestingly, however, there exist rare and isolated initial conditions for which one individual outer planetary orbit may appear integrable over a 200-Myr time-scale, while all the other planets simultaneously appear chaotic. © 2010 The Authors. Journal compilation © 2010 RAS

    SANA: simulated annealing far outperforms many other search algorithms for biological network alignment

    Full text link
    SummaryEvery alignment algorithm consists of two orthogonal components: an objective function M measuring the quality of an alignment, and a search algorithm that explores the space of alignments looking for ones scoring well according to M . We introduce a new search algorithm called SANA (Simulated Annealing Network Aligner) and apply it to protein-protein interaction networks using S 3 as the topological measure. Compared against 12 recent algorithms, SANA produces 5-10 times as many correct node pairings as the others when the correct answer is known. We expose an anti-correlation in many existing aligners between their ability to produce good topological vs. functional similarity scores, whereas SANA usually outscores other methods in both measures. If given the perfect objective function encoding the identity mapping, SANA quickly converges to the perfect solution while many other algorithms falter. We observe that when aligning networks with a known mapping and optimizing only S 3 , SANA creates alignments that are not perfect and yet whose S 3 scores match that of the perfect alignment. We call this phenomenon saturation of the topological score . Saturation implies that a measure's correlation with alignment correctness falters before the perfect alignment is reached. This, combined with SANA's ability to produce the perfect alignment if given the perfect objective function, suggests that better objective functions may lead to dramatically better alignments. We conclude that future work should focus on finding better objective functions, and offer SANA as the search algorithm of choice.Availability and implementationSoftware available at http://sana.ics.uci.edu [email protected] informationSupplementary data are available at Bioinformatics online

    Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

    Get PDF
    The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps
    • …
    corecore