55,047 research outputs found

    Solving Multiclass Learning Problems via Error-Correcting Output Codes

    Full text link
    Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k &gt 2 values (i.e., k ``classes''). The definition is acquired by studying collections of training examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of overfitting avoidance techniques such as decision-tree pruning. Finally, we show that---like the other methods---the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.Comment: See http://www.jair.org/ for any accompanying file

    Microsatellite genotyping of apple (Malus × domestica Borkh.) genetic resources in the Netherlands: application in collection management and variety identification

    Get PDF
    A highly informative set of 16 microsatellite markers was used to fingerprint 695 apple accessions from eight Dutch collections. Among the total sample, 475 different genotypes were distinguished based on multi-locus microsatellite variation, revealing a potential redundancy within the total sample of 32%. The majority of redundancies were found between collections, rather than within collections. No single collection covered the total observed diversity well, as each collection consisted of about 50% of unique accessions. These findings reflected the fact that most collection holders focus on common Dutch varieties, as well as on region-specific diversity. Based on the diversity patterns observed, maintenance of genetic resources by a network of co-operating collection holders, rather than by collecting the total diversity in a single collection appears to be an efficient approach. Comparison of microsatellite and passport data showed that for many accessions the marker data did not provide support for the registered variety names. Verification of accessions showed that discrepancies between passport and molecular data were largely due to documentation and phenotypic determination errors. With the help of the marker data the varietal names of 45 accessions could be corrected. Microsatellite genotyping of apple appears to be an efficient tool in the management of collections and in variety identification. The development of a marker database was considered relevant as a reference instrument in variety identification and as a source of information about thus far unexplored diversity that could be of interest in the development of new apple varietie

    Grading of parameters for urban tree inventories by city officials, arborists and academics using the Delphi method

    Get PDF
    Tree inventories are expensive to conduct and update, so every inventory carried out must be maximized. However, increasing the number of constituent parameters increases the cost of performing and updating the inventory, illustrating the need for careful parameter selection. This paper reports the results of a systematic expert rating of tree inventories aiming to quantify the relative importance of each parameter. Using the Delphi method, panels comprising city officials, arborists and academics rated a total of 148 parameters. In order of total mean score, the top ranking parameters, which can serve as a guide for decision-making at practical level and for standardization of tree inventories, were: Scientific name of the tree species and genera, Vitality, Coordinates, Hazard class and Identification number. The study also examined whether the different responsibilities and usage of urban tree databases among organizations and people engaged in urban tree inventories affected their prioritization. The results revealed noticeable dissimilarities in the ranking of parameters between the panels, underlining the need for collaboration between the research community and those commissioning, administrating and conducting inventories. Only by applying such a transdisciplinary approach to parameter selection can urban tree inventories be strengthened and made more relevant

    Self-Correcting Broadcast in Distributed Hash Tables

    Get PDF
    We present two broadcast algorithms that can be used on top of distributed hash tables (DHTs) to perform group communication and arbitrary queries. Unlike other P2P group communication mechanisms, which either embed extra information in the DHTs or use random overlay networks, our algorithms take advantage of the structured DHT overlay networks without maintaining additional information. The proposed algorithms do not send any redundant messages. Furthermore the two algorithms ensure 100% coverage of the nodes in the system even when routing information is outdated as a result of dynamism in the network. The first algorithm performs some correction of outdated routing table entries with a low cost of correction traffic. The second algorithm exploits the nature of the broadcasts to extensively update erroneous routing information at the cost of higher correction traffic. The algorithms are validated and evaluated in our stochastic distributed-algorithms simulator

    Molecular phylogeny of brachiopods and phoronids based on nuclear-encoded small subunit ribosomal RNA gene sequences

    Get PDF
    Brachiopod and phoronid phylogeny is inferred from SSU rDNA sequences of 28 articulate and nine inarticulate brachiopods, three phoronids, two ectoprocts and various outgroups, using gene trees reconstructed by weighted parsimony, distance and maximum likelihood methods. Of these sequences, 33 from brachiopods, two from phoronids and one each from an ectoproct and a priapulan are newly determined. The brachiopod sequences belong to 31 different genera and thus survey about 10% of extant genus-level diversity. Sequences determined in different laboratories and those from closely related taxa agree well, but evidence is presented suggesting that one published phoronid sequence (GenBank accession UO12648) is a brachiopod-phoronid chimaera, and this sequence is excluded from the analyses. The chiton, Acanthopleura, is identified as the phenetically proximal outgroup; other selected outgroups were chosen to allow comparison with recent, non-molecular analyses of brachiopod phylogeny. The different outgroups and methods of phylogenetic reconstruction lead to similar results, with differences mainly in the resolution of weakly supported ancient and recent nodes, including the divergence of inarticulate brachiopod sub-phyla, the position of the rhynchonellids in relation to long- and short-looped articulate brachiopod clades and the relationships of some articulate brachiopod genera and species. Attention is drawn to the problem presented by nodes that are strongly supported by non-molecular evidence but receive only low bootstrap resampling support. Overall, the gene trees agree with morphology-based brachiopod taxonomy, but novel relationships are tentatively suggested for thecideidine and megathyrid brachiopods. Articulate brachiopods are found to be monophyletic in all reconstructions, but monophyly of inarticulate brachiopods and the possible inclusion of phoronids in the inarticulate brachiopod clade are less strongly established. Phoronids are clearly excluded from a sister-group relationship with articulate brachiopods, this proposed relationship being due to the rejected, chimaeric sequence (GenBank UO12648). Lineage relative rate tests show no heterogeneity of evolutionary rate among articulate brachiopod sequences, but indicate that inarticulate brachiopod plus phoronid sequences evolve somewhat more slowly. Both brachiopods and phoronids evolve slowly by comparison with other invertebrates. A number of palaeontologically dated times of earliest appearance are used to make upper and lower estimates of the global rate of brachiopod SSU rDNA evolution, and these estimates are used to infer the likely divergence times of other nodes in the gene tree. There is reasonable agreement between most inferred molecular and palaeontological ages. The estimated rates of SSU rDNA sequence evolution suggest that the last common ancestor of brachiopods, chitons and other protostome invertebrates (Lophotrochozoa and Ecdysozoa) lived deep in Precambrian time. Results of this first DNA-based, taxonomically representative analysis of brachiopod phylogeny are in broad agreement with current morphology-based classification and systematics and are largely consistent with the hypothesis that brachiopod shell ontogeny and morphology are a good guide to phylogeny

    A comparison of climate simulations for the last glacial maximum with three different versions of the ECHAM model and implications for summer-green tree refugia

    Get PDF
    This is an open access article. The official link can be found below.Model simulations of the last glacial maximum (21 ± 2 ka) with the ECHAM3 T42 atmosphere-only, ECHAM5-MPIOM T31 atmosphere-ocean coupled and ECHAM5 T106 atmosphere-only models are compared. The topography, land-sea mask and glacier distribution for the ECHAM5 simulations were taken from the Paleoclimate Modelling Intercomparison Project Phase II (PMIP2) data set while for ECHAM3 they were taken from PMIP1. The ECHAM5-MPIOM T31 model produced its own sea surface temperatures (SST) while the ECHAM5 T106 simulations were forced at the boundaries by this coupled model SSTs corrected from their present-day biases and the ECHAM3 T42 model was forced with prescribed SSTs provided by Climate/Long-Range Investigation, Mapping, and Prediction project (CLIMAP). The SSTs in the ECHAM5-MPIOM simulation for the last glacial maximum (LGM) were much warmer in the northern Atlantic than those suggested by CLIMAP or Overview of Glacial Atlantic Ocean Mapping (GLAMAP) while the SSTs were cooler everywhere else. This had a clear effect on the temperatures over Europe, warmer for winters in western Europe and cooler for eastern Europe than the simulation with CLIMAP SSTs. Considerable differences in the general circulation patterns were found in the different simulations. A ridge over western Europe for the present climate during winter in the 500 hPa height field remains in both ECHAM5 simulations for the LGM, more so in the T106 version, while the ECHAM3 CLIMAP-SST simulation provided a trough which is consistent with cooler temperatures over western Europe. The zonal wind between 30° W and 10° E shows a southward shift of the polar and subtropical jets in the simulations for the LGM, least obvious in the ECHAM5 T31 one, and an extremely strong polar jet for the ECHAM3 CLIMAP-SST run. The latter can probably be assigned to the much stronger north-south gradient in the CLIMAP SSTs. The southward shift of the polar jet during the LGM is supported by palaeo-data. Cyclone tracks in winter represented by high precipitation are characterised over Europe for the present by a main branch from the British Isles to Norway and a secondary branch towards the Mediterranean Sea, observed and simulated. For the LGM the different models show very different solutions: the ECHAM3 CLIMAP-SST simulation shows just one track going eastward from the British Isles into central Europe, while the ECHAM5 T106 simulation still has two branches but during the LGM the main one goes to the Mediterranean Sea, with enhanced precipitation in the Levant. This agrees with an observed high stand of the Dead Sea during the LGM. For summer the ECHAM5 T106 simulation provides much more precipitation for the present over Europe than the other simulations, thus agreeing with estimates by the Global Precipitation Climatology Project (GPCP). Also during the LGM this model makes Europe less arid than the other simulations. In many respects the ECHAM5 T106 simulation for the present is more realistic than the ECHAM5 T31 coupled simulation and the older ECHAM3 T42 simulation, when comparing them with the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis or the GPCP precipitation data. For validating the model data for the LGM, pollen, wood and charcoal analyses were compared with possible summer-green tree growth from model estimates using summer precipitation, minimum winter temperatures and growing degree days (above 5 °C). The ECHAM5 T106 simulation suggests for more sites with findings of palaeo-data, likely tree growth during the LGM than the other simulations, especially over western Europe. The clear message especially from the ECHAM5 T106 simulation is that warm-loving summer-green trees could have survived mainly in Spain but also in Greece in agreement with findings of pollen or charcoal. Southern Italy is also suggested but this could not be validated because of absence of palaeo-data. Previous climate simulations of the LGM have suggested less cold and more humid climate than that reconstructed from pollen findings. Our model results do agree more or less with those of other models but we do not find a contradiction with palaeo-data because we use the pollen data directly without an intermediate reconstruction of temperatures and precipitation from the pollen spectra

    Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants

    Get PDF
    Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modern ‘‘anthophyte hypothesis,’’ which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups. M. J. Sanderson,* M. F. Wojciechowski,*† J.-M. Hu,* T. Sher Khan,* and S. G. Brad
    corecore