111,368 research outputs found

    Using Avida to test the effects of natural selection on phylogenetic reconstruction methods

    Get PDF
    Phylogenetic trees group organisms by their ancestral relationships. There are a number of distinct algorithms used to reconstruct these trees from molecular sequence data, but different methods sometimes give conflicting results. Since there are few precisely known phylogenies, simulations are typically used to test the quality of reconstruction algorithms. These simulations randomly evolve strings of symbols to produce a tree, and then the algorithms are run with the tree leaves as inputs. Here we use Avida to test two widely used reconstruction methods, which gives us the chance to observe the effect of natural selection on tree reconstruction. We find that if the organisms undergo natural selection between branch points, the methods will be successful even on very large time scales. However, these algorithms often falter when selection is absent

    Low Cost Quality of Service Multicast Routing in High Speed Networks

    Get PDF
    Many of the services envisaged for high speed networks, such as B-ISDN/ATM, will support real-time applications with large numbers of users. Examples of these types of application range from those used by closed groups, such as private video meetings or conferences, where all participants must be known to the sender, to applications used by open groups, such as video lectures, where partcipants need not be known by the sender. These types of application will require high volumes of network resources in addition to the real-time delay constraints on data delivery. For these reasons, several multicast routing heuristics have been proposed to support both interactive and distribution multimedia services, in high speed networks. The objective of such heuristics is to minimise the multicast tree cost while maintaining a real-time bound on delay. Previous evaluation work has compared the relative average performance of some of these heuristics and concludes that they are generally efficient, although some perform better for small multicast groups and others perform better for larger groups. Firstly, we present a detailed analysis and evaluation of some of these heuristics which illustrates that in some situations their average performance is reversed; a heuristic that in general produces efficient solutions for small multicasts may sometimes produce a more efficient solution for a particular large multicast, in a specific network. Also, in a limited number of cases using Dijkstra's algorithm produces the best result. We conclude that the efficiency of a heuristic solution depends on the topology of both the network and the multicast, and that it is difficult to predict. Because of this unpredictability we propose the integration of two heuristics with Dijkstra's shortest path tree algorithm to produce a hybrid that consistently generates efficient multicast solutions for all possible multicast groups in any network. These heuristics are based on Dijkstra's algorithm which maintains acceptable time complexity for the hybrid, and they rarely produce inefficient solutions for the same network/multicast. The resulting performance attained is generally good and in the rare worst cases is that of the shortest path tree. The performance of our hybrid is supported by our evaluation results. Secondly, we examine the stability of multicast trees where multicast group membership is dynamic. We conclude that, in general, the more efficient the solution of a heuristic is, the less stable the multicast tree will be as multicast group membership changes. For this reason, while the hybrid solution we propose might be suitable for use with closed user group multicasts, which are likely to be stable, we need a different approach for open user group multicasting, where group membership may be highly volatile. We propose an extension to an existing heuristic that ensures multicast tree stability where multicast group membership is dynamic. Although this extension decreases the efficiency of the heuristics solutions, its performance is significantly better than that of the worst case, a shortest path tree. Finally, we consider how we might apply the hybrid and the extended heuristic in current and future multicast routing protocols for the Internet and for ATM Networks.

    Algorithmic Clustering of Music

    Full text link
    We present a fully automatic method for music classification, based only on compression of strings that represent the music pieces. The method uses no background knowledge about music whatsoever: it is completely general and can, without change, be used in different areas like linguistic classification and genomics. It is based on an ideal theory of the information content in individual objects (Kolmogorov complexity), information distance, and a universal similarity metric. Experiments show that the method distinguishes reasonably well between various musical genres and can even cluster pieces by composer.Comment: 17 pages, 11 figure

    Assessing the role of EO in biodiversity monitoring: options for integrating in-situ observations with EO within the context of the EBONE concept

    Get PDF
    The European Biodiversity Observation Network (EBONE) is a European contribution on terrestrial monitoring to GEO BON, the Group on Earth Observations Biodiversity Observation Network. EBONE’s aims are to develop a system of biodiversity observation at regional, national and European levels by assessing existing approaches in terms of their validity and applicability starting in Europe, then expanding to regions in Africa. The objective of EBONE is to deliver: 1. A sound scientific basis for the production of statistical estimates of stock and change of key indicators; 2. The development of a system for estimating past changes and forecasting and testing policy options and management strategies for threatened ecosystems and species; 3. A proposal for a cost-effective biodiversity monitoring system. There is a consensus that Earth Observation (EO) has a role to play in monitoring biodiversity. With its capacity to observe detailed spatial patterns and variability across large areas at regular intervals, our instinct suggests that EO could deliver the type of spatial and temporal coverage that is beyond reach with in-situ efforts. Furthermore, when considering the emerging networks of in-situ observations, the prospect of enhancing the quality of the information whilst reducing cost through integration is compelling. This report gives a realistic assessment of the role of EO in biodiversity monitoring and the options for integrating in-situ observations with EO within the context of the EBONE concept (cfr. EBONE-ID1.4). The assessment is mainly based on a set of targeted pilot studies. Building on this assessment, the report then presents a series of recommendations on the best options for using EO in an effective, consistent and sustainable biodiversity monitoring scheme. The issues that we faced were many: 1. Integration can be interpreted in different ways. One possible interpretation is: the combined use of independent data sets to deliver a different but improved data set; another is: the use of one data set to complement another dataset. 2. The targeted improvement will vary with stakeholder group: some will seek for more efficiency, others for more reliable estimates (accuracy and/or precision); others for more detail in space and/or time or more of everything. 3. Integration requires a link between the datasets (EO and in-situ). The strength of the link between reflected electromagnetic radiation and the habitats and their biodiversity observed in-situ is function of many variables, for example: the spatial scale of the observations; timing of the observations; the adopted nomenclature for classification; the complexity of the landscape in terms of composition, spatial structure and the physical environment; the habitat and land cover types under consideration. 4. The type of the EO data available varies (function of e.g. budget, size and location of region, cloudiness, national and/or international investment in airborne campaigns or space technology) which determines its capability to deliver the required output. EO and in-situ could be combined in different ways, depending on the type of integration we wanted to achieve and the targeted improvement. We aimed for an improvement in accuracy (i.e. the reduction in error of our indicator estimate calculated for an environmental zone). Furthermore, EO would also provide the spatial patterns for correlated in-situ data. EBONE in its initial development, focused on three main indicators covering: (i) the extent and change of habitats of European interest in the context of a general habitat assessment; (ii) abundance and distribution of selected species (birds, butterflies and plants); and (iii) fragmentation of natural and semi-natural areas. For habitat extent, we decided that it did not matter how in-situ was integrated with EO as long as we could demonstrate that acceptable accuracies could be achieved and the precision could consistently be improved. The nomenclature used to map habitats in-situ was the General Habitat Classification. We considered the following options where the EO and in-situ play different roles: using in-situ samples to re-calibrate a habitat map independently derived from EO; improving the accuracy of in-situ sampled habitat statistics, by post-stratification with correlated EO data; and using in-situ samples to train the classification of EO data into habitat types where the EO data delivers full coverage or a larger number of samples. For some of the above cases we also considered the impact that the sampling strategy employed to deliver the samples would have on the accuracy and precision achieved. Restricted access to European wide species data prevented work on the indicator ‘abundance and distribution of species’. With respect to the indicator ‘fragmentation’, we investigated ways of delivering EO derived measures of habitat patterns that are meaningful to sampled in-situ observations

    Survey of data mining approaches to user modeling for adaptive hypermedia

    Get PDF
    The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

    On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

    Full text link
    We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935
    corecore