8 research outputs found

    顕著特徴領域を利用したBoVWベース類似画像検索の改善方式の検討

    Get PDF
    従来のBag of Visual Words(BoVW)による類似画像検索では,画像の局所特徴量を抽出して特徴ヒストグラムにし,ヒストグラムの類似度計算により画像の類似度を求める.従来のBoVWは類似画像検索で成功を収めているが,特徴点を均等に取り扱うため,画像内における主オブジェクトの重要性を無視するという改良可能な点が存在する.現在インターネットに流通する多くの画像は主オブジェクトが存在する前景と背景の二つの部分で構成されているが,その主オブジェクトは画像のセマンティクスを判断するのに重要な役割を果たしている.一方,画像の背景部分は画像のセマンティクスと無関係な様々な要素を含むため,類似画像検索において精度低下の要因となる可能性がある.本研究では類似画像検索を行う場合,画像における主オブジェクトの類似度を重視すべきであると考える.そこで,本研究では主オブジェクトの類似性に着目したBoVWベースの類似検索手法を提案する.提案手法では,画像の前景位置は顕著特徴領域で近似できると想定し,顕著特徴領域の類似性を調べる.具体的には一枚の画像が与えられた時,まず画像から128次元のSIFT特徴記述子で記述された特徴点を抽出する.次にRegion-based Contrast for Salient Region Detection(RC)の手法で得られる顕著特徴マップを二値化することで顕著領域を確定し,その顕著領域を主オブジェクトが存在する前景とみなし,残りの部分は背景とみなす.その後,前景領域から前景特徴ヒストグラムを形成する.そして,2枚の画像間の類似度を前景ヒストグラムの類似度と画像全体から得た大域的ヒストグラムの類似度を平均して求める.この方法により,2枚の画像の前景の類似性を強調できるので類似画像検索の精度向上が期待できる.Caltech101画像データベースを用いた実験により,提案手法が従来のBoVWよりも再現率を改善できることを確認した.尚.本研究成果を2014年1月,パターン認識・メディア理解研究会(PRMU)で発表した.電気通信大学201

    The State of the Art of Medical Imaging Technology: from Creation to Archive and Back

    Get PDF
    Medical imaging has learnt itself well into modern medicine and revolutionized medical industry in the last 30 years. Stemming from the discovery of X-ray by Nobel laureate Wilhelm Roentgen, radiology was born, leading to the creation of large quantities of digital images as opposed to film-based medium. While this rich supply of images provides immeasurable information that would otherwise not be possible to obtain, medical images pose great challenges in archiving them safe from corrupted, lost and misuse, retrievable from databases of huge sizes with varying forms of metadata, and reusable when new tools for data mining and new media for data storing become available. This paper provides a summative account on the creation of medical imaging tomography, the development of image archiving systems and the innovation from the existing acquired image data pools. The focus of this paper is on content-based image retrieval (CBIR), in particular, for 3D images, which is exemplified by our developed online e-learning system, MIRAGE, home to a repository of medical images with variety of domains and different dimensions. In terms of novelties, the facilities of CBIR for 3D images coupled with image annotation in a fully automatic fashion have been developed and implemented in the system, resonating with future versatile, flexible and sustainable medical image databases that can reap new innovations

    The state of the art of medical imaging technology: from creation to archive and back.

    Get PDF
    Medical imaging has learnt itself well into modern medicine and revolutionized medical industry in the last 30 years. Stemming from the discovery of X-ray by Nobel laureate Wilhelm Roentgen, radiology was born, leading to the creation of large quantities of digital images as opposed to film-based medium. While this rich supply of images provides immeasurable information that would otherwise not be possible to obtain, medical images pose great challenges in archiving them safe from corrupted, lost and misuse, retrievable from databases of huge sizes with varying forms of metadata, and reusable when new tools for data mining and new media for data storing become available. This paper provides a summative account on the creation of medical imaging tomography, the development of image archiving systems and the innovation from the existing acquired image data pools. The focus of this paper is on content-based image retrieval (CBIR), in particular, for 3D images, which is exemplified by our developed online e-learning system, MIRAGE, home to a repository of medical images with variety of domains and different dimensions. In terms of novelties, the facilities of CBIR for 3D images coupled with image annotation in a fully automatic fashion have been developed and implemented in the system, resonating with future versatile, flexible and sustainable medical image databases that can reap new innovations

    GOLD: Gaussians of Local Descriptors for Image Representation

    Get PDF
    The Bag of Words paradigm has been the baseline from which several successful image classification solutions were developed in the last decade. These represent images by quantizing local descriptors and summarizing their distribution. The quantization step introduces a dependency on the dataset, that even if in some contexts significantly boosts the performance, severely limits its generalization capabilities. Differently, in this paper, we propose to model the local features distribution with a multivariate Gaussian, without any quantization. The full rank covariance matrix, which lies on a Riemannian manifold, is projected on the tangent Euclidean space and concatenated to the mean vector. The resulting representation, a Gaussian of local descriptors (GOLD), allows to use the dot product to closely approximate a distance between distributions without the need for expensive kernel computations. We describe an image by an improved spatial pyramid, which avoids boundary effects with soft assignment: local descriptors contribute to neighboring Gaussians, forming a weighted spatial pyramid of GOLD descriptors. In addition, we extend the model leveraging dataset characteristics in a mixture of Gaussian formulation further improving the classification accuracy. To deal with large scale datasets and high dimensional feature spaces the Stochastic Gradient Descent solver is adopted. Experimental results on several publicly available datasets show that the proposed method obtains state-of-the-art performance

    Nearest-Neighbor based Metric Functions for indoor scene recognition

    Get PDF
    Indoor scene recognition is a challenging problem in the classical scene recognition domain due to the severe intra-class variations and inter-class similarities of man-made indoor structures. State-of-the-art scene recognition techniques such as capturing holistic representations of an image demonstrate low performance on indoor scenes. Other methods that introduce intermediate steps such as identifying objects and associating them with scenes have the handicap of successfully localizing and recognizing the objects in a highly cluttered and sophisticated environment. We propose a classification method that can handle such difficulties of the problem domain by employing a metric function based on the Nearest-Neighbor classification procedure using the bag-of-visual words scheme, the so-called codebooks. Considering the codebook construction as a Voronoi tessellation of the feature space, we have observed that, given an image, a learned weighted distance of the extracted feature vectors to the center of the Voronoi cells gives a strong indication of the image's category. Our method outperforms state-of-the-art approaches on an indoor scene recognition benchmark and achieves competitive results on a general scene dataset, using a single type of descriptor. © 2011 Elsevier Inc. All rights reserved

    On Building a Universal and Compact Visual Vocabulary

    Get PDF
    Bag-of-visual-words has been shown to be a powerful image representation and attained great success in many computer vision and pattern recognition applications. Usually, for a given dataset, researchers choose to build a specific visual vocabulary from the dataset, and the problem of deriving a universal visual vocabulary is rarely addressed. Based on previous work on the classification performance with respect to visual vocabulary sizes, we arrive at a hypothesis that a universal visual vocabulary can be obtained by taking-into account the similarity extent of keypoints represented by one visual word. We then propose to use a similarity threshold-based clustering method to calculate the optimal vocabulary size, where the universal similarity threshold can be obtained empirically. With the optimal vocabulary size, the optimal visual vocabularies of limited sizes from three datasets are shown to be exchangeable and therefore universal. This result indicates that a universal and compact visual vocabulary can be built from a not too small dataset. Our work narrows the gab between bag-of-visual-words and bag-of-words, where a relatively fixed vocabulary can be used with different text datasets

    Nearest-neighbor based metric functions for indoor scene recognition

    Get PDF
    Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2011.Thesis (Master's) -- Bilkent University, 2011.Includes bibliographical references leaves 39-44.Indoor scene recognition is a challenging problem in the classical scene recognition domain due to the severe intra-class variations and inter-class similarities of man-made indoor structures. State-of-the-art scene recognition techniques such as capturing holistic representations of an image demonstrate low performance on indoor scenes. Other methods that introduce intermediate steps such as identifying objects and associating them with scenes have the handicap of successfully localizing and recognizing the objects in a highly cluttered and sophisticated environment. We propose a classi cation method that can handle such di culties of the problem domain by employing a metric function based on the nearest-neighbor classi cation procedure using the bag-of-visual words scheme, the so-called codebooks. Considering the codebook construction as a Voronoi tessellation of the feature space, we have observed that, given an image, a learned weighted distance of the extracted feature vectors to the center of the Voronoi cells gives a strong indication of the image's category. Our method outperforms state-of-the-art approaches on an indoor scene recognition benchmark and achieves competitive results on a general scene dataset, using a single type of descriptor. In this study although our primary focus is indoor scene categorization, we also employ the proposed metric function to create a baseline implementation for the auto-annotation problem. With the growing amount of digital media, the problem of auto-annotating images with semantic labels has received signi cant interest from researches in the last decade. Traditional approaches where such content is manually tagged has been found to be too tedious and a time-consuming process. Hence, succesfully labeling images with keywords describing the semantics is a crucial task yet to be accomplished.Çakır, FatihM.S
    corecore