101 research outputs found

    A standalone version of IsoFinder for the computational prediction of isochores in genome sequences

    Get PDF
    Isochores are long genome segments relatively homogeneous in G+C. A heuristic algorithm based on entropic segmentation has been developed by our group, and a web server implementing all the required components is available. However, a researcher may want to perform batch processing of many sequences simultaneously in its local machine, instead of analyzing them on one by one basis through the web. To this end, standalone versions are required. We report here the implementation of two standalone programs, able to predict isochores at the sequence level: 1) a command-line version (IsoFinder) for Windows and Linux systems; and 2) a user-friendly version (IsoFinderWin) running under Windows.Comment: 7 pages, 3 figure

    Isochores Merit the Prefix 'Iso'

    Full text link
    The isochore concept in human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in IGHSC analysis concerning the existence of isochore is incorrect, because it had applied an inappropriate statistical test. To test the existence of isochores should be equivalent to a test of homogeneity of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is however a test of a sequence being random on the base level. For testing the existence of isochore, or homogeneity in GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by binomial test may not be rejected by the ANOVA test.Comment: 14 pages (including 1 figure), submitte

    Simplifying the mosaic description of DNA sequences

    Get PDF
    By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbours, may however share compositional similarity with one or more distant (non--neighbouring) domains. We thus obtain a coarse--grained description of the given DNA string in terms of a smaller set of distinct domain labels. This yields a minimal domain description of a given DNA sequence, significantly reducing its organizational complexity. This procedure gives a new means of evaluating genomic complexity as one examines organisms ranging from bacteria to human. The mosaic organization of DNA sequences could have originated from the insertion of fragments of one genome (the parasite) inside another (the host), and we present numerical experiments that are suggestive of this scenario.Comment: 16 pages, 1 figure, Accepted for publication in Phys. Rev.

    Phylogenetic distribution of large-scale genome patchiness

    Get PDF
    [Background] The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. [Results] The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. [Conclusion] Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.This work was supported by the Spanish Government (BIO2005-09116-C03-01) and Plan Andaluz de InvestigaciĂłn (CVI-162, P06-FQM-01858, P07-FQM-03163 and TIC-640)

    Finite-sample frequency distributions originating from an equiprobability distribution

    Full text link
    Given an equidistribution for probabilities p(i)=1/N, i=1..N. What is the expected corresponding rank ordered frequency distribution f(i), i=1..N, if an ensemble of M events is drawn?Comment: 4 pages, 4 figure

    Scaling analysis of multivariate intermittent time series

    Full text link
    The scaling properties of the time series of asset prices and trading volumes of stock markets are analysed. It is shown that similarly to the asset prices, the trading volume data obey multi-scaling length-distribution of low-variability periods. In the case of asset prices, such scaling behaviour can be used for risk forecasts: the probability of observing next day a large price movement is (super-universally) inversely proportional to the length of the ongoing low-variability period. Finally, a method is devised for a multi-factor scaling analysis. We apply the simplest, two-factor model to equity index and trading volume time series.Comment: 16 pages, 5 figures, accepted for publication in Physica

    New stopping criteria for segmenting DNA sequences

    Get PDF
    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian Information Criterion (BIC) in the model selection framework. When this stopping criterion is applied to a left telomere sequence of yeast Saccharomyces cerevisiae and the complete genome sequence of bacterium Escherichia coli, borders of biologically meaningful units were identified (e.g. subtelomeric units, replication origin, and replication terminus), and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.Comment: 4 pages, 4 figures, Physical Review Letters, to appea

    Phase Transition in a Random Fragmentation Problem with Applications to Computer Science

    Full text link
    We study a fragmentation problem where an initial object of size x is broken into m random pieces provided x>x_0 where x_0 is an atomic cut-off. Subsequently the fragmentation process continues for each of those daughter pieces whose sizes are bigger than x_0. The process stops when all the fragments have sizes smaller than x_0. We show that the fluctuation of the total number of splitting events, characterized by the variance, generically undergoes a nontrivial phase transition as one tunes the branching number m through a critical value m=m_c. For m<m_c, the fluctuations are Gaussian where as for m>m_c they are anomalously large and non-Gaussian. We apply this general result to analyze two different search algorithms in computer science.Comment: 5 pages RevTeX, 3 figures (.eps

    Effects of coarse-graining on the scaling behavior of long-range correlated and anti-correlated signals

    Full text link
    We investigate how various coarse-graining methods affect the scaling properties of long-range power-law correlated and anti-correlated signals, quantified by the detrended fluctuation analysis. Specifically, for coarse-graining in the magnitude of a signal, we consider (i) the Floor, (ii) the Symmetry and (iii) the Centro-Symmetry coarse-graining methods. We find, that for anti-correlated signals coarse-graining in the magnitude leads to a crossover to random behavior at large scales, and that with increasing the width of the coarse-graining partition interval Δ\Delta this crossover moves to intermediate and small scales. In contrast, the scaling of positively correlated signals is less affected by the coarse-graining, with no observable changes when Δ1\Delta1 a crossover appears at small scales and moves to intermediate and large scales with increasing Δ\Delta. For very rough coarse-graining (Δ>3\Delta>3) based on the Floor and Symmetry methods, the position of the crossover stabilizes, in contrast to the Centro-Symmetry method where the crossover continuously moves across scales and leads to a random behavior at all scales, thus indicating a much stronger effect of the Centro-Symmetry compared to the Floor and the Symmetry methods. For coarse-graining in time, where data points are averaged in non-overlapping time windows, we find that the scaling for both anti-correlated and positively correlated signals is practically preserved. The results of our simulations are useful for the correct interpretation of the correlation and scaling properties of symbolic sequences.Comment: 19 pages, 13 figure
    • …
    corecore