1,109 research outputs found

    A review of multi-instance learning assumptions

    Get PDF
    Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    k-NN Boosting Prototype Learning for Object Classification

    Get PDF
    Object classification is a challenging task in computer vision. Many approaches have been proposed to extract meaningful descriptors from images and classifying them in a supervised learning framework. In this paper, we revisit the classic k-nearest neighbors (k-NN) classification rule, which has shown to be very effective when dealing with local image descriptors. However, k-NN still features some major drawbacks, mainly due to the uniform voting among the nearest prototypes in the feature space. In this paper, we propose a generalization of the classic k-NN rule in a supervised learning (boosting) framework. Namely, we redefine the voting rule as a strong classifier that linearly combines predictions from the k closest prototypes. To induce this classifier, we propose a novel learning algorithm, MLNN (Multiclass Leveraged Nearest Neighbors), which gives a simple procedure for performing prototype selection very efficiently. We tested our method on 12 categories of objects, and observed significant improvement over classic k-NN in terms of classification performances

    Fixation prediction with a combined model of bottom-up saliency and vanishing point

    Full text link
    By predicting where humans look in natural scenes, we can understand how they perceive complex natural scenes and prioritize information for further high-level visual processing. Several models have been proposed for this purpose, yet there is a gap between best existing saliency models and human performance. While many researchers have developed purely computational models for fixation prediction, less attempts have been made to discover cognitive factors that guide gaze. Here, we study the effect of a particular type of scene structural information, known as the vanishing point, and show that human gaze is attracted to the vanishing point regions. We record eye movements of 10 observers over 532 images, out of which 319 have vanishing points. We then construct a combined model of traditional saliency and a vanishing point channel and show that our model outperforms state of the art saliency models using three scores on our dataset.Comment: arXiv admin note: text overlap with arXiv:1512.0172

    A Framework For Learning Scene Independent Edge Detection

    Get PDF
    In this work, a framework for a system which will intelligently assign an edge detection filter to an image based on features taken from the image is introduced. The framework has four parts: the learning stage, image feature extraction, training filter creation, and filter selection training. Two prototypes systems of this framework are given. The learning stage for these systems is the Berkeley Segmentation Database coupled with the Baddelay Delta Metric. Feature extraction is performed using a GIST methodology which extracts color, intensity, and orientation information. The set of image features are used as the input to a single hidden layer feed forward neural network trained using back propagation. The system trains against a set of linear cellular automata filters which are determined to best solve the edge image according to the Baddelay Delta Metric. One system uses cellular automata augmented with a fuzzy rule. The systems are trained and tested against the images from the Berkeley Segmentation Database. The results from the testing indicate that systems built on this framework can perform better than standard methods of edge detection on average across many types of images
    • …
    corecore