375 research outputs found

    Boosted Classification Trees and Class Probability/Quantile Estimation

    Get PDF
    The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1jx]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1jx]. We first examine whether the latter problem, estimation of P[y = 1jx], can be solved with Logit- Boost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1jx] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets

    Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

    Get PDF
    The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as the statistical view has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided

    Working on the Argument Pipeline: Through Flow Issues between Natural Language Argument, Instantiated Arguments, and Argumentation Frameworks

    Get PDF
    In many domains of public discourse such as arguments about public policy, there is an abundance of knowledge to store, query, and reason with. To use this knowledge, we must address two key general problems: first, the problem of the knowledge acquisition bottleneck between forms in which the knowledge is usually expressed, e.g., natural language, and forms which can be automatically processed; second, reasoning with the uncertainties and inconsistencies of the knowledge. Given such complexities, it is labour and knowledge intensive to conduct policy consultations, where participants contribute statements to the policy discourse. Yet, from such a consultation, we want to derive policy positions, where each position is a set of consistent statements, but where positions may be mutually inconsistent. To address these problems and support policy-making consultations, we consider recent automated techniques in natural language processing, instantiating arguments, and reasoning with the arguments in argumentation frameworks. We discuss application and “bridge” issues between these techniques, outlining a pipeline of technologies whereby: expressions in a controlled natural language are parsed and translated into a logic (a literals and rules knowledge base), from which we generate instantiated arguments and their relationships using a logic-based formalism (an argument knowledge base), which is then input to an implemented argumentation framework that calculates extensions of arguments (an argument extensions knowledge base), and finally, we extract consistent sets of expressions (policy positions). The paper reports progress towards reasoning with web-based, distributed, collaborative, incomplete, and inconsistent knowledge bases expressed in natural language

    The quantum one-time pad in the presence of an eavesdropper

    Get PDF
    A classical one-time pad allows two parties to send private messages over a public classical channel -- an eavesdropper who intercepts the communication learns nothing about the message. A quantum one-time pad is a shared quantum state which allows two parties to send private messages or private quantum states over a public quantum channel. If the eavesdropper intercepts the quantum communication she learns nothing about the message. In the classical case, a one-time pad can be created using shared and partially private correlations. Here we consider the quantum case in the presence of an eavesdropper, and find the single letter formula for the rate at which the two parties can send messages using a quantum one-time pad

    Locking of accessible information and implications for the security of quantum cryptography

    Full text link
    The unconditional security of a quantum key distribution protocol is often defined in terms of the accessible information, that is, the maximum mutual information between the distributed key S and the outcome of an optimal measurement on the adversary's (quantum) system. We show that, even if this quantity is small, certain parts of the key S might still be completely insecure when S is used in applications, such as for one-time pad encryption. This flaw is due to a locking property of the accessible information: one additional (physical) bit of information might increase the accessible information by more than one bit.Comment: 5 pages; minor change

    Multidimensional reconciliation for continuous-variable quantum key distribution

    Get PDF
    We propose a method for extracting an errorless secret key in a continuous-variable quantum key distribution protocol, which is based on Gaussian modulation of coherent states and homodyne detection. The crucial feature is an eight-dimensional reconciliation method, based on the algebraic properties of octonions. Since the protocol does not use any postselection, it can be proven secure against arbitrary collective attacks, by using well-established theorems on the optimality of Gaussian attacks. By using this new coding scheme with an appropriate signal to noise ratio, the distance for secure continuous-variable quantum key distribution can be significantly extended.Comment: 8 pages, 3 figure

    Broadening the Scope of Nanopublications

    Full text link
    In this paper, we present an approach for extending the existing concept of nanopublications --- tiny entities of scientific results in RDF representation --- to broaden their application range. The proposed extension uses English sentences to represent informal and underspecified scientific claims. These sentences follow a syntactic and semantic scheme that we call AIDA (Atomic, Independent, Declarative, Absolute), which provides a uniform and succinct representation of scientific assertions. Such AIDA nanopublications are compatible with the existing nanopublication concept and enjoy most of its advantages such as information sharing, interlinking of scientific findings, and detailed attribution, while being more flexible and applicable to a much wider range of scientific results. We show that users are able to create AIDA sentences for given scientific results quickly and at high quality, and that it is feasible to automatically extract and interlink AIDA nanopublications from existing unstructured data sources. To demonstrate our approach, a web-based interface is introduced, which also exemplifies the use of nanopublications for non-scientific content, including meta-nanopublications that describe other nanopublications.Comment: To appear in the Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013
    • …
    corecore