22 research outputs found

    Computing the Skewness of the Phylogenetic Mean Pairwise Distance in Linear Time

    Get PDF
    The phylogenetic Mean Pairwise Distance (MPD) is one of the most popular measures for computing the phylogenetic distance between a given group of species. More specifically, for a phylogenetic tree T and for a set of species R represented by a subset of the leaf nodes of T, the MPD of R is equal to the average cost of all possible simple paths in T that connect pairs of nodes in R. Among other phylogenetic measures, the MPD is used as a tool for deciding if the species of a given group R are closely related. To do this, it is important to compute not only the value of the MPD for this group but also the expectation, the variance, and the skewness of this metric. Although efficient algorithms have been developed for computing the expectation and the variance the MPD, there has been no approach so far for computing the skewness of this measure. In the present work we describe how to compute the skewness of the MPD on a tree T optimally, in Theta(n) time; here n is the size of the tree T. So far this is the first result that leads to an exact, let alone efficient, computation of the skewness for any popular phylogenetic distance measure. Moreover, we show how we can compute in Theta(n) time several interesting quantities in T that can be possibly used as building blocks for computing efficiently the skewness of other phylogenetic measures.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Uncovering the missing routes: an algorithmic study of the illicit antiquities trade network

    No full text
    No abstract available

    Analysis of flow and visibility on triangulated terrains

    Get PDF
    Landscapes and their morphology have been widely studied for predicting physical phenomena, such as floods or erosion, but also for planning human activities effectively, such as building prominent fortifications and watchtowers. Nowadays, the study of terrains is done in a computer-based environment; terrains are modelled by digital representations, and algorithms are used to simulate physical processes like water flow and to compute attributes like visibility from certain locations. In the current thesis we focus on designing new algorithms for computing structures related to water ow and visibility on digital terrain representations. Most specifically, the terrain representations that we considered are the so-called Triangulated Irregular Networks (tins), that is, piecewise linear surfaces that consist of triangles. One of the problems that are considered is the effect of noise on the worst-case complexity of visibility structures on tins. The view that a person can have from a point on the surface of a tin can be very complex, since in the worst case thin obstacles in the foreground may appear to fragment many long terrain edges in the background into visible and invisible pieces. In our analysis we considered tins whose triangles have some well-defined properties that terrains in practice are expected to have. Although complex visibility structures can be induced on such tins as well, we proved formally that slight perturbations on the elevations of the tin vertex set will always get rid of the high complexity. Another key problem that is studied is to design efficient algorithms that compute flow-related structures on tins. So far it was known that, in the case of tins, drainage structures that were computed using a consistent flow-model could have high complexity for specific input instances. We managed to develop a mechanism that can extract important information on flow paths and other drainage structures without computing those structures explicitly. This mechanism can be used as a basis for designing a variety of efficient algorithms, such as for computing the area measure of drainage structures or for computing structures that represent the terrain topology. The last part of the presented work involves the implementation of a software package that computes drainage structures on tins. In this package flow is modelled as following strictly the direction of steepest descent on the tin surface. Existing software for related applications either constrain flow on the edge set of the tin, or use inexact arithmetic, both of which introduces imprecise and/or incorrect results in the output. Our implementation is the first one that, at the same time, follows a robust flow model and uses exact arithmetic. We have used this implementation as a point of reference for evaluating experimentally the quality of the output of other flow models which are used in many hydrological applications. We have also used our software for conducting experiments on extracting watersheds on imprecise tins, that is, tins where the elevation values of the vertices are not exactly defined but are subject to noise from some given interval. Based on the results of these experiments, we have designed a novel method for extracting watersheds on imprecise terrains that produces high quality output

    Fast Computations for Measures of Phylogenetic Beta Diversity.

    No full text
    For many applications in ecology, it is important to examine the phylogenetic relations between two communities of species. More formally, let [Formula: see text] be a phylogenetic tree and let A and B be two samples of its tips, representing the examined communities. We want to compute a value that expresses the phylogenetic diversity between A and B in [Formula: see text]. There exist several measures that can do this; these are the so-called phylogenetic beta diversity (β-diversity) measures. Two popular measures of this kind are the Community Distance (CD) and the Common Branch Length (CBL). In most applications, it is not sufficient to compute the value of a beta diversity measure for two communities A and B; we also want to know if this value is relatively large or small compared to all possible pairs of communities in [Formula: see text] that have the same size. To decide this, the ideal approach is to compute a standardised index that involves the mean and the standard deviation of this measure among all pairs of species samples that have the same number of elements as A and B. However, no method exists for computing exactly and efficiently this index for CD and CBL. We present analytical expressions for computing the expectation and the standard deviation of CD and CBL. Based on these expressions, we describe efficient algorithms for computing the standardised indices of the two measures. Using standard algorithmic analysis, we provide guarantees on the theoretical efficiency of our algorithms. We implemented our algorithms and measured their efficiency in practice. Our implementations compute the standardised indices of CD and CBL in less than twenty seconds for a hundred pairs of samples on trees with 7 ⋅ 10(4) tips. Our implementations are available through the R package PhyloMeasures

    Running times of implemented algorithms computing values and standardized indices for CD and CBL.

    No full text
    <p>For each implementation and for each tree size, the figures illustrate the time that it takes for the function to process a set of one hundred samples.</p

    Fast generation of multiple resolution instances of raster data sets

    No full text
    In many GIS applications it is important to study the characteristics of a raster data set at multiple resolutions. Often this is done by generating several coarser resolution rasters from a fine resolution raster. In this paper we describe efficient algorithms for different variants of this problem. Given a raster G of vN × vN cells we first consider the problem of computing for every 2 = µ = vN a raster Gµ of vN/µ × vN/µ cells such that each cell of Gµ stores the average of the values of µ × µ cells of G. We describe an algorithm that solves this problem in T(N) time when the handled data fit in the main memory of the computer. We also provide two algorithms that solve this problem in external memory, that is when the input raster is larger than the main memory. The first external algorithm is very easy to implement and requires O(sort(N)) data block transfers from/to the external memory, and the second algorithm requires only O(scan(N)) transfers, where sort(N) and scan(N) are the number of transfers needed to sort and scan N elements, respectively. We also study a variant of the problem where instead of the full input raster we handle only a connected subregion of arbitrary shape. For this variant we describe an algorithm that runs in T(U log N) time in internal memory, where U is the size of the output. We show how this algorithm can be adapted to perform efficiently in the external memory using O(sort(U)) data transfers from the disk. We have also implemented two of the presented algorithms, the O(sort(N)) external memory algorithm for full rasters, and the internal memory algorithm that handles connected subregions, and we demonstrate their efficiency in practice

    Visibility maps of realistic terrains have linear smoothed complexity

    No full text
    oai:journals.carleton.ca/jocg:article/12We study the complexity of the visibility map of terrains whose triangles are fat, not too steep and have roughly the same size. It is known that the complexity of the visibility map of such a terrain with n triangles is Θ(n2) in the worst case. We prove that if the elevations of the vertices of the terrain are subject to uniform noise which is proportional to the edge lengths, then the worst-case expected (smoothed) complexity is only Θ(n). We also prove non-trivial bounds for the smoothed complexity of instances where some triangles do not satisfy the above properties. Our results provide an explanation why visibility maps of superlinear complexity are unlikely to be encountered in practice
    corecore