12 research outputs found

    Pairwise Quantization

    Get PDF
    We consider the task of lossy compression of high-dimensional vectors through quantization. We propose the approach that learns quantization parameters by minimizing the distortion of scalar products and squared distances between pairs of points. This is in contrast to previous works that obtain these parameters through the minimization of the reconstruction error of individual points. The proposed approach proceeds by finding a linear transformation of the data that effectively reduces the minimization of the pairwise distortions to the minimization of individual reconstruction errors. After such transformation, any of the previously-proposed quantization approaches can be used. Despite the simplicity of this transformation, the experiments demonstrate that it achieves considerable reduction of the pairwise distortions compared to applying quantization directly to the untransformed data

    Segmentation Driven Object Detection with Fisher Vectors

    Get PDF
    International audienceWe present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results

    A Novel Image Retrieval Based on a Combination of Local and Global Histograms of Visual Words

    Get PDF
    Content-based image retrieval (CBIR) provides a sustainable solution to retrieve similar images from an image archive. In the last few years, the Bag-of-Visual-Words (BoVW) model gained attention and significantly improved the performance of image retrieval. In the standard BoVW model, an image is represented as an orderless global histogram of visual words by ignoring the spatial layout. The spatial layout of an image carries significant information that can enhance the performance of CBIR. In this paper, we are presenting a novel image representation that is based on a combination of local and global histograms of visual words. The global histogram of visual words is constructed over the whole image, while the local histogram of visual words is constructed over the local rectangular region of the image. The local histogram contains the spatial information about the salient objects. Extensive experiments and comparisons conducted on Corel-A, Caltech-256, and Ground Truth image datasets demonstrate that the proposed image representation increases the performance of image retrieval

    Human Action Localization And Recognition In Unconstrained Videos

    Get PDF
    As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing discriminative sub-regions of images and videos when performing recognition tasks. In this thesis, we address the action detection and recognition problems. Action detection in video is a particularly difficult problem because actions must not only be recognized correctly, but must also be localized in the 3D spatio-temporal volume. We introduce a technique that transforms the 3D localization problem into a series of 2D detection tasks. This is accomplished by dividing the video into overlapping segments, then representing each segment with a 2D video projection. The advantage of the 2D projection is that it makes it convenient to apply the best techniques from object detection to the action detection problem. We also introduce a novel, straightforward method for searching the 2D projections to localize actions, termed TwoPoint Subwindow Search (TPSS). Finally, we show how to connect the local detections in time using a chaining algorithm to identify the entire extent of the action. Our experiments show that video projection outperforms the latest results on action detection in a direct comparison. Second, we present a probabilistic model learning to identify discriminative regions in videos from weakly-supervised data where each video clip is only assigned a label describing what action is present in the frame or clip. While our first system requires every action to be manually outlined in every frame of the video, this second system only requires that the video be given a single highlevel tag. From this data, the system is able to identify discriminative regions that correspond well iii to the regions containing the actual actions. Our experiments on both the MSR Action Dataset II and UCF Sports Dataset show that the localizations produced by this weakly supervised system are comparable in quality to localizations produced by systems that require each frame to be manually annotated. This system is able to detect actions in both 1) non-temporally segmented action videos and 2) recognition tasks where a single label is assigned to the clip. We also demonstrate the action recognition performance of our method on two complex datasets, i.e. HMDB and UCF101. Third, we extend our weakly-supervised framework by replacing the recognition stage with a twostage neural network and apply dropout for preventing overfitting of the parameters on the training data. Dropout technique has been recently introduced to prevent overfitting of the parameters in deep neural networks and it has been applied successfully to object recognition problem. To our knowledge, this is the first system using dropout for action recognition problem. We demonstrate that using dropout improves the action recognition accuracies on HMDB and UCF101 datasets

    Computer Vision for Tissue Characterization and Outcome Prediction in Cancer

    Get PDF
    The aim of this dissertation was to investigate the use of computer vision for tissue characterization and patient outcome prediction in cancer. This work focused on analysis of digitized tissue specimens, which were stained only for basic morphology (i.e. hematoxylin and eosin). The applicability of texture analysis and convolutional neural networks was evaluated for detection of biologically and clinically relevant features. Moreover, novel approaches to guide ground-truth annotation and outcome-supervised learning for prediction of patient survival directly from the tumor tissue images without expert guidance was investigated. We first studied quantification of tumor viability through segmentation of necrotic and viable tissue compartments. We developed a regional texture analysis method, which was trained and tested on whole sections of mouse xenograft models of human lung cancer. Our experiments showed that the proposed segmentation was able to discriminate between viable and non-viable tissue regions with high accuracy when compared to human expert assessment. We next investigated the feasibility of pre-trained convolutional neural networks in analysis of breast cancer tissue, aiming to quantify tumor-infiltrating lymphocytes in the specimens. Interestingly, our results showed that pre-trained convolutional neural networks can be adapted for analysis of histological image data, outperforming texture analysis. The results also indicated that the computerized assessment was on par with pathologist assessments. Moreover, the study presented an image annotation technique guided by specific antibody staining for improved ground-truth labeling. Direct outcome prediction in breast cancer was then studied using a nationwide patient cohort. A computerized pipeline, which incorporated orderless feature aggregation and convolutional image descriptors for outcome-supervised classification, resulted in a risk grouping that was predictive of both disease-specific and overall survival. Surprisingly, further analysis suggested that the computerized risk prediction was also an independent prognostic factor that provided information complementary to the standard clinicopathological factors. This doctoral thesis demonstrated how computer-vision methods can be powerful tools in analysis of cancer tissue samples, highlighting strategies for supervised characterization of tissue entities and an approach for identification of novel prognostic morphological features.Kudosnäytteiden mikroskooppisten piirteiden visuaalinen tarkastelu on yksi tärkeimmistä määrityksistä syöpäpotilaiden diagnosoinnissa ja hoidon suunnittelussa. Edistyneet kuvantamisteknologiat ovat mahdollistaneet histologisten kasvainkudosnäytteiden digitalisoinnin tarkalla resoluutiolla. Näytteiden digitalisoinnin seurauksena niiden analysointiin voidaan soveltaa edistyneitä koneoppimiseen perustuvia konenäön menetelmiä. Tämä väitöskirja tutkii konenäön menetelmien soveltamista syöpäkudosnäytteiden laskennalliseen analyysiin. Työssä tutkitaan yksittäisten histologisten entiteettien, kuten nekroottisen kudoksen ja immuunisolujen automaattista kvantifiointia. Lisäksi työssä esitellään menetelmä potilaan selviytymisen ennustamiseen pelkkään kudosmorfologiaan perustuen

    Contextual Person Identification in Multimedia Data

    Get PDF
    We propose methods to improve automatic person identification, regardless of the visibility of a face, by integration of multiple cues including multiple modalities and contextual information. We propose a joint learning approach using contextual information from videos to improve learned face models. Further, we integrate additional modalities in a global fusion framework. We evaluate our approaches on a novel TV series data set, consisting of over 100 000 annotated faces
    corecore