14 research outputs found

    Tractability of Theory Patching

    Full text link
    In this paper we consider the problem of `theory patching', in which we are given a domain theory, some of whose components are indicated to be possibly flawed, and a set of labeled training examples for the domain concept. The theory patching problem is to revise only the indicated components of the theory, such that the resulting theory correctly classifies all the training examples. Theory patching is thus a type of theory revision in which revisions are made to individual components of the theory. Our concern in this paper is to determine for which classes of logical domain theories the theory patching problem is tractable. We consider both propositional and first-order domain theories, and show that the theory patching problem is equivalent to that of determining what information contained in a theory is `stable' regardless of what revisions might be performed to the theory. We show that determining stability is tractable if the input theory satisfies two conditions: that revisions to each theory component have monotonic effects on the classification of examples, and that theory components act independently in the classification of examples in the theory. We also show how the concepts introduced can be used to determine the soundness and completeness of particular theory patching algorithms.Comment: See http://www.jair.org/ for any accompanying file

    Committee-Based Sample Selection for Probabilistic Classifiers

    Full text link
    In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger

    Training data cleaning for text classification

    No full text
    Abstract. In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain; strategies are thus needed for maximizing the effectiveness of the resulting classifiers while minimizing the required amount of training effort. Training data cleaning (TDC) consists in devising ranking functions that sort the original training examples in terms of how likely it is that the human annotator has misclassified them, thereby providing a convenient means for the human annotator to revise the training set so as to improve its quality. Working in the context of boosting-based learning methods we present three different techniques for performing TDC and, on two widely used TC benchmarks, evaluate them by their capability of spotting misclassified texts purposefully inserted in the training set.

    Improving Inter-level Communication in Cascaded Finite-State Partial Parsers

    No full text
    An improved inter-level communication strategy that enhances the capabilities of cascaded finite-state partial parsing systems is presented. Cascaded automata are allowed to make forward calls to other automata in the cascade as well as backward references to previously identified groupings. The approach is more powerful than a design in which the output of the current level is simply passed to the next level in the cascade. The approach is evaluated on randomly extracted sentences from the Encarta encyclopedia. A discussion of related research is also presented

    Active Learning for Duplicate Record Identification in Deep Web

    No full text

    An Approximate Algorithm for Reverse Engineering of Multi-layer Perceptrons

    No full text

    Stream-based active unusual event detection

    No full text
    Abstract. We present a new active learning approach to incorporate human feedback for on-line unusual event detection. In contrast to most existing unsupervised methods that perform passive mining for unusual events,ourapproachautomaticallyrequestssupervisionforcriticalpoints to resolve ambiguities of interest, leading to more robust and accurate detection on subtle unusual events. The active learning strategy is formulated as a stream-based solution, i.e.it makes decision on-the-fly on whether to query for labels. It adaptively combines multiple active learningcriteriatoachieve(i)quickdiscoveryofunknowneventclassesand(ii) refinement of classification boundary. Experimental results on busy publicspacevideosshowthatwithminimalhumansupervision,ourapproach outperforms existing supervised and unsupervised learning strategies in identifying unusual events. In addition, better performance is achieved by using adaptive multi-criteria approach compared to existing single criterion and multi-criteria active learning strategies.

    Visual self-localization with tiny images

    No full text
    Abstract. Self-localization of mobile robots is often performed visually, whereby the resolution of the images influences a lot the computation time. In this paper, we examine how a reduction of the image resolution affects localization accuracy. We downscale the images, preserving their aspect ratio, up to a tiny resolution of 15×11 and 20×15 pixels. Our results are based on extensive tests on different datasets that have been recorded indoors by a small differential drive robot and outdoors by a flying quadrocopter. Four well-known global image features and a pixelwise image comparison method are compared under realistic conditions such as illumination changes and translations. Our results show that even when reducing the image resolution down to the tiny resolutions above, accurate localization is achievable. In this way, we can speed up the localization process considerably.