12 research outputs found
Pairwise Quantization
We consider the task of lossy compression of high-dimensional vectors through
quantization. We propose the approach that learns quantization parameters by
minimizing the distortion of scalar products and squared distances between
pairs of points. This is in contrast to previous works that obtain these
parameters through the minimization of the reconstruction error of individual
points. The proposed approach proceeds by finding a linear transformation of
the data that effectively reduces the minimization of the pairwise distortions
to the minimization of individual reconstruction errors. After such
transformation, any of the previously-proposed quantization approaches can be
used. Despite the simplicity of this transformation, the experiments
demonstrate that it achieves considerable reduction of the pairwise distortions
compared to applying quantization directly to the untransformed data
Segmentation Driven Object Detection with Fisher Vectors
International audienceWe present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results
A Novel Image Retrieval Based on a Combination of Local and Global Histograms of Visual Words
Content-based image retrieval (CBIR) provides a sustainable solution to retrieve similar images from an image archive. In the last few years, the Bag-of-Visual-Words (BoVW) model gained attention and significantly improved the performance of image retrieval. In the standard BoVW model, an image is represented as an orderless global histogram of visual words by ignoring the spatial layout. The spatial layout of an image carries significant information that can enhance the performance of CBIR. In this paper, we are presenting a novel image representation that is based on a combination of local and global histograms of visual words. The global histogram of visual words is constructed over the whole image, while the local histogram of visual words is constructed over the local rectangular region of the image. The local histogram contains the spatial information about the salient objects. Extensive experiments and comparisons conducted on Corel-A, Caltech-256, and Ground Truth image datasets demonstrate that the proposed image representation increases the performance of image retrieval
Human Action Localization And Recognition In Unconstrained Videos
As imaging systems become ubiquitous, the ability to recognize human actions is becoming increasingly important. Just as in the object detection and recognition literature, action recognition can be roughly divided into classification tasks, where the goal is to classify a video according to the action depicted in the video, and detection tasks, where the goal is to detect and localize a human performing a particular action. A growing literature is demonstrating the benefits of localizing discriminative sub-regions of images and videos when performing recognition tasks. In this thesis, we address the action detection and recognition problems. Action detection in video is a particularly difficult problem because actions must not only be recognized correctly, but must also be localized in the 3D spatio-temporal volume. We introduce a technique that transforms the 3D localization problem into a series of 2D detection tasks. This is accomplished by dividing the video into overlapping segments, then representing each segment with a 2D video projection. The advantage of the 2D projection is that it makes it convenient to apply the best techniques from object detection to the action detection problem. We also introduce a novel, straightforward method for searching the 2D projections to localize actions, termed TwoPoint Subwindow Search (TPSS). Finally, we show how to connect the local detections in time using a chaining algorithm to identify the entire extent of the action. Our experiments show that video projection outperforms the latest results on action detection in a direct comparison. Second, we present a probabilistic model learning to identify discriminative regions in videos from weakly-supervised data where each video clip is only assigned a label describing what action is present in the frame or clip. While our first system requires every action to be manually outlined in every frame of the video, this second system only requires that the video be given a single highlevel tag. From this data, the system is able to identify discriminative regions that correspond well iii to the regions containing the actual actions. Our experiments on both the MSR Action Dataset II and UCF Sports Dataset show that the localizations produced by this weakly supervised system are comparable in quality to localizations produced by systems that require each frame to be manually annotated. This system is able to detect actions in both 1) non-temporally segmented action videos and 2) recognition tasks where a single label is assigned to the clip. We also demonstrate the action recognition performance of our method on two complex datasets, i.e. HMDB and UCF101. Third, we extend our weakly-supervised framework by replacing the recognition stage with a twostage neural network and apply dropout for preventing overfitting of the parameters on the training data. Dropout technique has been recently introduced to prevent overfitting of the parameters in deep neural networks and it has been applied successfully to object recognition problem. To our knowledge, this is the first system using dropout for action recognition problem. We demonstrate that using dropout improves the action recognition accuracies on HMDB and UCF101 datasets
Computer Vision for Tissue Characterization and Outcome Prediction in Cancer
The aim of this dissertation was to investigate the use of computer vision for tissue characterization and patient outcome prediction in cancer. This work focused on analysis of digitized tissue specimens, which were stained only for basic morphology (i.e. hematoxylin and eosin). The applicability of texture analysis and convolutional neural networks was evaluated for detection of biologically and clinically relevant features. Moreover, novel approaches to guide ground-truth annotation and outcome-supervised learning for prediction of patient survival directly from the tumor tissue images without expert guidance was investigated.
We first studied quantification of tumor viability through segmentation of necrotic and viable tissue compartments. We developed a regional texture analysis method, which was trained and tested on whole sections of mouse xenograft models of human lung cancer. Our experiments showed that the proposed segmentation was able to discriminate between viable and non-viable tissue regions with high accuracy when compared to human expert assessment.
We next investigated the feasibility of pre-trained convolutional neural networks in analysis of breast cancer tissue, aiming to quantify tumor-infiltrating lymphocytes in the specimens. Interestingly, our results showed that pre-trained convolutional neural networks can be adapted for analysis of histological image data, outperforming texture analysis. The results also indicated that the computerized assessment was on par with pathologist assessments. Moreover, the study presented an image annotation technique guided by specific antibody staining for improved ground-truth labeling.
Direct outcome prediction in breast cancer was then studied using a nationwide patient cohort. A computerized pipeline, which incorporated orderless feature aggregation and convolutional image descriptors for outcome-supervised classification, resulted in a risk grouping that was predictive of both disease-specific and overall survival. Surprisingly, further analysis suggested that the computerized risk prediction was also an independent prognostic factor that provided information complementary to the standard clinicopathological factors.
This doctoral thesis demonstrated how computer-vision methods can be powerful tools in analysis of cancer tissue samples, highlighting strategies for supervised characterization of tissue entities and an approach for identification of novel prognostic morphological features.Kudosnäytteiden mikroskooppisten piirteiden visuaalinen tarkastelu on yksi tärkeimmistä määrityksistä syöpäpotilaiden diagnosoinnissa ja hoidon suunnittelussa. Edistyneet kuvantamisteknologiat ovat mahdollistaneet histologisten kasvainkudosnäytteiden digitalisoinnin tarkalla resoluutiolla. Näytteiden digitalisoinnin seurauksena niiden analysointiin voidaan soveltaa edistyneitä koneoppimiseen perustuvia konenäön menetelmiä. Tämä väitöskirja tutkii konenäön menetelmien soveltamista syöpäkudosnäytteiden laskennalliseen analyysiin. Työssä tutkitaan yksittäisten histologisten entiteettien, kuten nekroottisen kudoksen ja immuunisolujen automaattista kvantifiointia. Lisäksi työssä esitellään menetelmä potilaan selviytymisen ennustamiseen pelkkään kudosmorfologiaan perustuen
Recommended from our members
Pattern recognition employing spatially variant unconstrained correlation filters
A spatial domain Optimal Trade-off Maximum Average Correlation Height (SPOT-MACH) filter is proposed in this thesis. The proposed technique uses a pre-defined fixed size kernel rather than using estimation techniques. The spatial domain implementation of OT-MACH offers the advantage that it does not have shift invariance imposed on it as the kernel can be modified depending upon its position within the input image. This allows normalization of the kernel and allows inclusion of a space domain non-linearity to improve performance.
The proposed SPOT-MACH filter can be used to maximize the height of the correlation peak in the presence of distortions of the training object and provide resistance to background clutter. One of the major characteristics of the SPOT-MACH filter is that it can be tuned to maximize the height and sharpness of the correlation peak by using trade-offs between distortion tolerance, peak sharpness and the ability to suppress clutter noise.
A number of non-parametric local regression techniques offer a simplified approach to pattern recognition problems which employ linear filtering using low pass filters designed
using moving window local approximations. In most of these cases the algorithms search for a region of interest near the point of estimation for various prevailing conditions which fit the required criteria. These estimates are calculated for a defined window size which is determined as being the largest area within which the estimators do not widely vary from the criteria. The only drawback in this approach is that the window size is directly proportional to the required computational resources and would adversely affect the performance of the system if the moving window size is not proportionate to the resources.
The proposed filter employs an optimization technique using low-pass filtering to highlight the potential region of interests in the image and then restricts the movement of the kernel to these regions to allow target identification and to use less computational resources. Also another optimization technique is also proposed which is based on an entropy filter which measures the degree of randomness between two changing scenes and would return the area where change has occurred i.e. the target object might be present. This approach gives a more accurate region of interest than the low-pass filtering approach.
Apart from the software based optimization approaches two hardware based enhancement techniques have also been proposed in this thesis. One of the approaches employs Field
Programmable Gate Array (FPGA) to perform correlation process employing the inbuilt multipliers and look up tables and the other one uses Graphical Processing Unit (GPU) to do parallel processing of the input scene.
Also in this thesis a detailed analysis of SPOT-MACH has been carried out by comparing with popular feature based techniques like Scale Invariant Feature Transform (SIFT) and a comparison matrix has been created.
The proposed filter uses a two-staged approach using speed optimizations and then detection of targets from input scenes. Both visible and Forward Looking Infrared (FLIR) imagery data sets have been used to test the performance of filter
Contextual Person Identification in Multimedia Data
We propose methods to improve automatic person identification, regardless of the visibility of a face, by integration of multiple cues including multiple modalities and contextual information. We propose a joint learning approach using contextual information from videos to improve learned face models. Further, we integrate additional modalities in a global fusion framework. We evaluate our approaches on a novel TV series data set, consisting of over 100 000 annotated faces