4,476 research outputs found

    Vision and language understanding with localized evidence

    Full text link
    Enabling machines to solve computer vision tasks with natural language components can greatly improve human interaction with computers. In this thesis, we address vision and language tasks with deep learning methods that explicitly localize relevant visual evidence. Spatial evidence localization in images enhances the interpretability of the model, while temporal localization in video is necessary to remove irrelevant content. We apply our methods to various vision and language tasks, including visual question answering, temporal activity detection, dense video captioning and cross-modal retrieval. First, we tackle the problem of image question answering, which requires the model to predict answers to questions posed about images. We design a memory network with a question-guided spatial attention mechanism which assigns higher weights to regions that are more relevant to the question. The visual evidence used to derive the answer can be shown by visualizing the attention weights in images. We then address the problem of localizing temporal evidence in videos. For most language/vision tasks, only part of the video is relevant to the linguistic component, so we need to detect these relevant events in videos. We propose an end-to-end model for temporal activity detection, which can detect arbitrary length activities by coordinate regression with respect to anchors and contains a proposal stage to filter out background segments, saving computation time. We further extend activity category detection to event captioning, which can express richer semantic meaning compared to a class label. This derives the problem of dense video captioning, which involves two sub-problems: localizing distinct events in long video and generating captions for the localized events. We propose an end-to-end hierarchical captioning model with vision and language context modeling in which the captioning training affects the activity localization. Lastly, the task of text-to-clip video retrieval requires one to localize the specified query instead of detecting and captioning all events. We propose a model based on the early fusion of words and visual features, outperforming standard approaches which embed the whole sentence before performing late feature fusion. Furthermore, we use queries to regulate the proposal network to generate query related proposals. In conclusion, our proposed visual localization mechanism applies across a variety of vision and language tasks and achieves state-of-the-art results. Together with the inference module, our work can contribute to solving other tasks such as video question answering in future research

    An Adaptive Color Image Segmentation

    Get PDF
    A novel Adaptive Color Image Segmentation (ACIS) System for color image segmentation is presented. The proposed ACIS system uses a neural network with architecture similar to the multilayer perceptron (MLP) network. The main difference is that neurons here uses a multisigmoid activation function. The multisigmoid function is the key for segmentation. The number of steps i.e. thresholds in the multisigmoid function are dependant on the number of clusters in the image. The threshold values for detecting the clusters and their labels are found automatically from the first order derivative of histograms of saturation and intensity in the HSV color space. Here, the main use of neural network is to detect the number of objects automatically from an image. The advantage of this method is that no a priori knowledge is required to segment the color image. ACIS label the objects with their mean colors. The algorithm is found to be reliable and works satisfactorily on different kinds of color images. Experimental results show that the performance of ACIS is robust on noisy images also

    ROBUST AND PARALLEL SEGMENTATION MODEL (RPSM) FOR EARLY DETECTION OF SKIN CANCER DISEASE USING HETEROGENEOUS DISTRIBUTIONS

    Get PDF
    Melanoma is the most common dangerous type of skin cancer; however, it is preventable if it is diagnosed early. Diagnosis of Melanoma would be improved if an accurate skin image segmentation model is available. Many computer vision methods have been investigated, yet the problem of finding a consistent and robust model that extracts the best threshold value, persists. This paper suggests a novel image segmentation approach using a multilevel cross entropy thresholding algorithm based on heterogeneous distributions. The proposed strategy searches the problem space by segmenting the image into several levels, and applying for each level one of the three benchmark distributions, including Gaussian, Lognormal or Gamma, which are combined to estimate the best thresholds that optimally extract the segmented regions. The classical technique of Minimum Cross Entropy Thresholding (MCET) is considered as the objective function for the applied method. Furthermore, a parallel processing algorithm is suggested to minimize the computational time of the proposed segmentation model in order to boost its performance. The efficiency of the proposed RPSM model is evaluated based on two datasets for skin cancer images: The International Skin Imaging Collaboration (ISIC) and Planet Hunters 2 (PH2). In conclusion, the proposed RPSM model shows a significant reduced processing time and reveals better accuracy and stable results, compared to other segmentation models. Design/methodology – The proposed model estimates two optimum threshold values that lead to extract optimally three segmented regions by combining the three benchmark statistical distributions: Gamma, Gaussian and lognormal. Outcomes – Based on the experimental results, the suggested segmentation methodology using MCET, could be nominated as a robust, precise and extremely reliable model with high efficiency. Novelty/utility –A novel multilevel segmentation model is developed using MCET technique and based on a combination of three statistical distributions: Gamma, Gaussian, and Lognormal. Moreover, this model is boosted by a parallelized method to reduce the processing time of the segmentation. Therefore, the suggested model should be considered as a precious mechanism in skin cancer disease detection

    A Novel Histogram-Based Multi-Threshold Searching Algorithm for Multilevel Color Thresholding

    Get PDF
    [[abstract]]Image segmentation is an important preliminary process required in object tracking applications. This paper addresses the issue of unsupervised multi‐colour thresholding design for colour‐based multiple objects segmentation. Most of the current unsupervised colour thresholding techniques require adopting a supervised training algorithm or a cluster‐number decision algorithm to obtain optimal threshold values of each colour channel for a colour‐of‐interest. In this paper, a novel unsupervised multi‐threshold searching algorithm is proposed to automatically search the optimal threshold values for segmenting multiple colour objects. To achieve this, a novel ratio‐map image computation method is proposed to efficiently enhance the contrast between colour and non¬colour pixels. The Otsu’s method is then applied to the ratio‐map image to extract all colour objects from the image. Finally, a new histogram‐based multi‐threshold searching algorithm is developed to search the optimal upper‐bound and lower‐bound threshold values of hue, saturation and brightness components for each colour object. Experimental results show that the proposed method not only succeeds in separating all colour objects-of-interest in colour images, but also provides satisfactory colour thresholding results compared with an existing multilevel thresholding method.[[notice]]補正完畢[[incitationindex]]SCI[[incitationindex]]EI[[booktype]]電子版[[booktype]]紙

    Segmentation of images by color features: a survey

    Get PDF
    En este articulo se hace la revisión del estado del arte sobre la segmentación de imagenes de colorImage segmentation is an important stage for object recognition. Many methods have been proposed in the last few years for grayscale and color images. In this paper, we present a deep review of the state of the art on color image segmentation methods; through this paper, we explain the techniques based on edge detection, thresholding, histogram-thresholding, region, feature clustering and neural networks. Because color spaces play a key role in the methods reviewed, we also explain in detail the most commonly color spaces to represent and process colors. In addition, we present some important applications that use the methods of image segmentation reviewed. Finally, a set of metrics frequently used to evaluate quantitatively the segmented images is shown
    corecore