485,998 research outputs found

    Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance

    Get PDF
    We present a new method for inferring species trees from multi-copy gene trees. Our method is based on a generalization of the Robinson-Foulds (RF) distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple leaves can have the same label. Unlike most previous phylogenetic methods using gene trees, this method does not assume that gene tree incongruence is caused by a single, specific biological process, such as gene duplication and loss, deep coalescence, or lateral gene transfer. We prove that it is NP-hard to compute the RF distance between two mul-trees, but it is easy to calculate the generalized RF distance between a mul-tree and a singly-labeled tree. Motivated by this observation, we formulate the RF supertree problem for mul-trees (MulRF), which takes a collection of mul-trees and constructs a species tree that minimizes the total RF distance from the input mul-trees. We present a fast heuristic algorithm for the MulRF supertree problem. Simulation experiments demonstrate that the MulRF method produces more accurate species trees than gene tree parsimony methods when incongruence is caused by gene tree error, duplications and losses, and/or lateral gene transfer. Furthermore, the MulRF heuristic runs quickly on data sets containing hundreds of trees with up to a hundred taxa.Comment: 16 pages, 11 figure

    Disassortativity of random critical branching trees

    Full text link
    Random critical branching trees (CBTs) are generated by the multiplicative branching process, where the branching number is determined stochastically, independent of the degree of their ancestor. Here we show analytically that despite this stochastic independence, there exists the degree-degree correlation (DDC) in the CBT and it is disassortative. Moreover, the skeletons of fractal networks, the maximum spanning trees formed by the edge betweenness centrality, behave similarly to the CBT in the DDC. This analytic solution and observation support the argument that the fractal scaling in complex networks originates from the disassortativity in the DDC.Comment: 3 pages, 2 figure

    Node harvest

    Full text link
    When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like Random Forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as `black box' predictors. Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a new observation falls into just a single node, its prediction is the mean response of all training observation within this node, identical to a tree-like prediction. A new observation falls typically into several nodes and its prediction is then the weighted average of the mean responses across all these nodes. The only role of node harvest is to `pick' the right nodes from the initial large ensemble of nodes by choosing node weights, which amounts in the proposed algorithm to a quadratic programming problem with linear inequality constraints. The solution is sparse in the sense that only very few nodes are selected with a nonzero weight. This sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to select a tuning parameter for optimal predictive accuracy. Node harvest can handle mixed data and missing values and is shown to be simple to interpret and competitive in predictive accuracy on a variety of data sets.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS367 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The utility of NBS profiling for plant systematics: a first study in tuber-bearing Solanum species

    Get PDF
    Systematic relationships are important criteria for researchers and breeders to select materials. We evaluated a novel molecular technique, nucleotide binding site (NBS) profiling, for its potential in phylogeny reconstruction. NBS profiling produces multiple markers in resistance genes and their analogs (RGAs). Potato (Solanum tuberosum L.) is a crop with a large secondary genepool, which contains many important traits that can be exploited in breeding programs. In this study we used a set of over 100 genebank accessions, representing 49 tuber-bearing wild and cultivated Solanum species. NBS profiling was compared to amplified fragment length polymorphism (AFLP). Cladistic and phenetic analyses showed that the two techniques had similar resolving power and delivered trees with a similar topology. However, the different statistical tests used to demonstrate congruency of the trees were inconclusive. Visual inspection of the trees showed that, especially at the lower level, many accessions grouped together in the same way in both trees; at the higher level, when looking at the more basal nodes, only a few groups were well supported. Again this was similar for both techniques. The observation that higher level groups were poorly supported might be due to the nature of the material and the way the species evolved. The similarity of the NBS and AFLP results indicate that the role of disease resistance in speciation is limite
    • …
    corecore