29 research outputs found

    Fast concurrent object classification and localization

    Get PDF
    Object localization and classification are important problems incomputer vision. However, in many applications, exhaustive searchover all class labels and image locations is computationallyprohibitive. While several methods have been proposed to makeeither classification or localization more efficient, few havedealt with both tasks simultaneously. This paper proposes anefficient method for concurrent object localization andclassification based on a data-dependent multi-classbranch-and-bound formalism. Existing bag-of-featuresclassification schemes, which can be expressed as weightedcombinations of feature counts can be readily adapted to ourmethod. We present experimental results that demonstrate the meritof our algorithm in terms of classification accuracy, localizationaccuracy, and speed, compared to baseline approaches includingexhaustive search, the ISM method, and single-class branch andbound

    Fast concurrent object localization and recognition

    Get PDF

    Optimization of Machine Learning Models with Segmentation to Determine the Pose of Cattle

    Get PDF
    Image pattern recognition poses numerous challenges, particularly in feature recognition, making it a complex problem for machine learning algorithms. This study focuses on the problem of cow pose detection, involving the classification of cow images into categories like front, right, left, and others. With the increasing popularity of image-based applications, such as object recognition in smartphone technologies, there is a growing need for accurate and efficient classification algorithms based on shape and color. In this paper, we propose a machine learning approach utilizing Support Vector Machine (SVM) and Random Forest (RF) algorithms for cow pose detection. To achieve an optimal model, we employ data augmentation techniques, including Gaussian blur, brightness adjustments, and segmentation. The proposed segmentation methods used are Canny and Kmeans. We compare several machine learning algorithms to identify the optimal approach in terms of accuracy. The success of our method is measured by accuracy and Receiver Operating Characteristic (ROC) analysis. The results indicate that using the Canny segmentation, SVM achieved 74.31% accuracy with a testing ratio of 90:10, while RF achieved 99.60% accuracy with the same testing ratio. Furthermore, testing with SVM and K-means segmentation reached an accuracy of 98.61% with a test ratio of 80:20. The study demonstrates the effectiveness of SVM and Random Forest algorithms in cow pose detection, with Kmeans segmentation yielding highly accurate results. These findings hold promising implications for real-world applications in image-based recognition systems. Based on the results of the model obtained, it is very important in pattern recognition to use segmentation based on color even though shape recognition

    Image and video segmentation using graph cuts

    Get PDF
    Includes abstract. Includes bibliographical references (leaves 67-71)

    Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos

    Get PDF
    Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling. In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos

    Simultaneous human segmentation, depth and pose estimation via dual decomposition

    Get PDF
    The tasks of stereo matching, segmentation, and human pose estimation have been popular in computer vision in recent years, but attempts to combine the three tasks have so far resulted in compromises: either using infra-red cameras, or a greatly simplified body model. We propose a framework for estimating a detailed human skeleton in 3D from a stereo pair of images. Within this framework, we define an energy function that incorporates the relationship between the segmentation results, the pose estimation results, and the disparity space image. Specifically, we codify the assertions that foreground pixels should relate to some body part, should correspond to a continuous surface in the disparityspace image, and should be closer to the camera than the surrounding background pixels. Our energy function is NP-hard, however we show how to efficiently optimize a relaxation of it using dual decomposition. We show that applying this approach leads to improved results in all three tasks, and also introduce an extensive and challenging new dataset, which we use as a benchmark for evaluating 3D human pose estimation

    Markov rasgele alanları aracılığı ile anlam bilgisi ve imge bölütlemenin birleştirilmesi.

    Get PDF
    The formulation of image segmentation problem is evolved considerably, from the early years of computer vision in 1970s to these years, in 2010s. While the initial studies offer mostly unsupervised approaches, a great deal of recent studies shift towards the supervised solutions. This is due to the advancements in the cognitive science and its influence on the computer vision research. Also, accelerated availability of computational power enables the researchers to develop complex algorithms. Despite the great effort on the image segmentation research, the state of the art techniques still fall short to satisfy the need of the further processing steps of computer vision. This study is another attempt to generate a “substantially complete” segmentation output for the consumption of object classification, recognition and detection steps. Our approach is to fuse the multiple segmentation outputs in order to achieve the “best” result with respect to a cost function. The proposed approach, called Boosted-MRF, elegantly formulates the segmentation fusion problem as a Markov Random Fields (MRF) model in an unsupervised framework. For this purpose, a set of initial segmentation outputs is obtained and the consensus among the segmentation partitions are formulated in the energy function of the Markov Random Fields model. Finally, minimization of the energy function yields the “best” consensus among the segmentation ensemble. We proceed one step further to improve the performance of the Boosted-MRF by introducing some auxiliary domain information into the segmentation fusion process. This enhanced segmentation fusion method, called the Domain Specific MRF, updates the energy function of the MRF model by the available information which is received from a domain expert. For this purpose, a top-down segmentation method is employed to obtain a set of Domain Specific Segmentation Maps which are incomplete segmentations of a given image. Therefore, in this second segmentation fusion method, in addition to the set of bottom-up segmentation ensemble, we generate ensemble of top-down Domain Specific Segmentation Maps. Based on the bottom–up and top down segmentation ensembles a new MRF energy function is defined. Minimization of this energy function yields the “best” consensus which is consistent with the domain specific information. The experiments performed on various datasets show that the proposed segmentation fusion methods improve the performances of the segmentation outputs in the ensemble measured with various indexes, such as Probabilistic Rand Index, Mutual Information. The Boosted-MRF method is also compared to a popular segmentation fusion method, namely, Best of K. The Boosted-MRF is slightly better than the Best of K method. The suggested Domain Specific-MRF method is applied on a set of outdoor images with vegetation where vegetation information is utilized as domain specific information. A slight improvement in the performance is recorded in this experiment. The method is also applied on remotely sensed dataset of building images, where more advanced domain specific information is available. The segmentation performance is evaluated with a performance measure which is specifically defined to estimate the segmentation performance for building images. In these two experiments with the Domain Specific-MRF method, it is observed that, as long as reliable domain specific information is available, the segmentation performance improves significantly.Ph.D. - Doctoral Progra
    corecore