485,998 research outputs found
Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance
We present a new method for inferring species trees from multi-copy gene
trees. Our method is based on a generalization of the Robinson-Foulds (RF)
distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple
leaves can have the same label. Unlike most previous phylogenetic methods using
gene trees, this method does not assume that gene tree incongruence is caused
by a single, specific biological process, such as gene duplication and loss,
deep coalescence, or lateral gene transfer. We prove that it is NP-hard to
compute the RF distance between two mul-trees, but it is easy to calculate the
generalized RF distance between a mul-tree and a singly-labeled tree. Motivated
by this observation, we formulate the RF supertree problem for mul-trees
(MulRF), which takes a collection of mul-trees and constructs a species tree
that minimizes the total RF distance from the input mul-trees. We present a
fast heuristic algorithm for the MulRF supertree problem. Simulation
experiments demonstrate that the MulRF method produces more accurate species
trees than gene tree parsimony methods when incongruence is caused by gene tree
error, duplications and losses, and/or lateral gene transfer. Furthermore, the
MulRF heuristic runs quickly on data sets containing hundreds of trees with up
to a hundred taxa.Comment: 16 pages, 11 figure
Disassortativity of random critical branching trees
Random critical branching trees (CBTs) are generated by the multiplicative
branching process, where the branching number is determined stochastically,
independent of the degree of their ancestor. Here we show analytically that
despite this stochastic independence, there exists the degree-degree
correlation (DDC) in the CBT and it is disassortative. Moreover, the skeletons
of fractal networks, the maximum spanning trees formed by the edge betweenness
centrality, behave similarly to the CBT in the DDC. This analytic solution and
observation support the argument that the fractal scaling in complex networks
originates from the disassortativity in the DDC.Comment: 3 pages, 2 figure
Node harvest
When choosing a suitable technique for regression and classification with
multivariate predictor variables, one is often faced with a tradeoff between
interpretability and high predictive accuracy. To give a classical example,
classification and regression trees are easy to understand and interpret. Tree
ensembles like Random Forests provide usually more accurate predictions. Yet
tree ensembles are also more difficult to analyze than single trees and are
often criticized, perhaps unfairly, as `black box' predictors. Node harvest is
trying to reconcile the two aims of interpretability and predictive accuracy by
combining positive aspects of trees and tree ensembles. Results are very sparse
and interpretable and predictive accuracy is extremely competitive, especially
for low signal-to-noise data. The procedure is simple: an initial set of a few
thousand nodes is generated randomly. If a new observation falls into just a
single node, its prediction is the mean response of all training observation
within this node, identical to a tree-like prediction. A new observation falls
typically into several nodes and its prediction is then the weighted average of
the mean responses across all these nodes. The only role of node harvest is to
`pick' the right nodes from the initial large ensemble of nodes by choosing
node weights, which amounts in the proposed algorithm to a quadratic
programming problem with linear inequality constraints. The solution is sparse
in the sense that only very few nodes are selected with a nonzero weight. This
sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to
select a tuning parameter for optimal predictive accuracy. Node harvest can
handle mixed data and missing values and is shown to be simple to interpret and
competitive in predictive accuracy on a variety of data sets.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS367 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The utility of NBS profiling for plant systematics: a first study in tuber-bearing Solanum species
Systematic relationships are important criteria for researchers and breeders to select materials. We evaluated a novel molecular technique, nucleotide binding site (NBS) profiling, for its potential in phylogeny reconstruction. NBS profiling produces multiple markers in resistance genes and their analogs (RGAs). Potato (Solanum tuberosum L.) is a crop with a large secondary genepool, which contains many important traits that can be exploited in breeding programs. In this study we used a set of over 100 genebank accessions, representing 49 tuber-bearing wild and cultivated Solanum species. NBS profiling was compared to amplified fragment length polymorphism (AFLP). Cladistic and phenetic analyses showed that the two techniques had similar resolving power and delivered trees with a similar topology. However, the different statistical tests used to demonstrate congruency of the trees were inconclusive. Visual inspection of the trees showed that, especially at the lower level, many accessions grouped together in the same way in both trees; at the higher level, when looking at the more basal nodes, only a few groups were well supported. Again this was similar for both techniques. The observation that higher level groups were poorly supported might be due to the nature of the material and the way the species evolved. The similarity of the NBS and AFLP results indicate that the role of disease resistance in speciation is limite
- …