236,779 research outputs found

    Committee-Based Sample Selection for Probabilistic Classifiers

    Full text link
    In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger

    Heteroscedasticity irrelevance when testing means difference

    Get PDF
    Peer Reviewe

    A new definition of mixing and segregation: Three dimensions of a key process variable

    Get PDF
    Although a number of definitions of mixing have been proposed in the literature, no single definition accurately and clearly describes the full range of problems in the field of industrial mixing. An alternate approach is proposed which defines segregation as being composed of three separate dimensions. The first dimension is the intensity of segregation quantified by the normalized concentration variance (CoV); the second dimension is the scale of segregation or clustering; and the last dimension is the exposure or the potential to reduce segregation. The first dimension focuses on the instantaneous concentration variance; the second on the instantaneous length scales in the mixing field; and the third on the driving force for change, i.e. the mixing time scale, or the instantaneous rate of reduction in segregation. With these three dimensions in hand, it is possible to speak more clearly about what is meant by the control of segregation in industrial mixing processes. In this paper, the three dimensions of segregation are presented and defined in the context of previous definitions of mixing, and then applied to a range of industrial mixing problems to test their accuracy and robustness

    Incommensurability: An Overview

    Get PDF
    Opening remarks delivered at "Incommensurability (and related matters)" conference, Hanover, June 199

    The Shape of Art History in the Eyes of the Machine

    Full text link
    How does the machine classify styles in art? And how does it relate to art historians' methods for analyzing style? Several studies have shown the ability of the machine to learn and predict style categories, such as Renaissance, Baroque, Impressionism, etc., from images of paintings. This implies that the machine can learn an internal representation encoding discriminative features through its visual analysis. However, such a representation is not necessarily interpretable. We conducted a comprehensive study of several of the state-of-the-art convolutional neural networks applied to the task of style classification on 77K images of paintings, and analyzed the learned representation through correlation analysis with concepts derived from art history. Surprisingly, the networks could place the works of art in a smooth temporal arrangement mainly based on learning style labels, without any a priori knowledge of time of creation, the historical time and context of styles, or relations between styles. The learned representations showed that there are few underlying factors that explain the visual variations of style in art. Some of these factors were found to correlate with style patterns suggested by Heinrich W\"olfflin (1846-1945). The learned representations also consistently highlighted certain artists as the extreme distinctive representative of their styles, which quantitatively confirms art historian observations
    corecore