54 research outputs found

    Toward Large Scale Semantic Image Understanding and Retrieval

    Get PDF
    Semantic image retrieval is a multifaceted, highly complex problem. Not only does the solution to this problem require advanced image processing and computer vision techniques, but it also requires knowledge beyond what can be inferred from the image content alone. In contrast, traditional image retrieval systems are based upon keyword searches on filenames or metadata tags, e.g. Google image search, Flickr search, etc. These conventional systems do not analyze the image content and their keywords are not guaranteed to represent the image. Thus, there is significant need for a semantic image retrieval system that can analyze and retrieve images based upon the content and relationships that exist in the real world.In this thesis, I present a framework that moves towards advancing semantic image retrieval in large scale datasets. At a conceptual level, semantic image retrieval requires the following steps: viewing an image, understanding the content of the image, indexing the important aspects of the image, connecting the image concepts to the real world, and finally retrieving the images based upon the index concepts or related concepts. My proposed framework addresses each of these components in my ultimate goal of improving image retrieval. The first task is the essential task of understanding the content of an image. Unfortunately, typically the only data used by a computer algorithm when analyzing images is the low-level pixel data. But, to achieve human level comprehension, a machine must overcome the semantic gap, or disparity that exists between the image data and human understanding. This translation of the low-level information into a high-level representation is an extremely difficult problem that requires more than the image pixel information. I describe my solution to this problem through the use of an online knowledge acquisition and storage system. This system utilizes the extensible, visual, and interactable properties of Scalable Vector Graphics (SVG) combined with online crowd sourcing tools to collect high level knowledge about visual content.I further describe the utilization of knowledge and semantic data for image understanding. Specifically, I seek to incorporate knowledge in various algorithms that cannot be inferred from the image pixels alone. This information comes from related images or structured data (in the form of hierarchies and ontologies) to improve the performance of object detection and image segmentation tasks. These understanding tasks are crucial intermediate steps towards retrieval and semantic understanding. However, the typical object detection and segmentation tasks requires an abundance of training data for machine learning algorithms. The prior training information provides information on what patterns and visual features the algorithm should be looking for when processing an image. In contrast, my algorithm utilizes related semantic images to extract the visual properties of an object and also to decrease the search space of my detection algorithm. Furthermore, I demonstrate the use of related images in the image segmentation process. Again, without the use of prior training data, I present a method for foreground object segmentation by finding the shared area that exists in a set of images. I demonstrate the effectiveness of my method on structured image datasets that have defined relationships between classes i.e. parent-child, or sibling classes.Finally, I introduce my framework for semantic image retrieval. I enhance the proposed knowledge acquisition and image understanding techniques with semantic knowledge through linked data and web semantic languages. This is an essential step in semantic image retrieval. For example, a car class classified by an image processing algorithm not enhanced by external knowledge would have no idea that a car is a type of vehicle which would also be highly related to a truck and less related to other transportation methods like a train . However, a query for modes of human transportation should return all of the mentioned classes. Thus, I demonstrate how to integrate information from both image processing algorithms and semantic knowledge bases to perform interesting queries that would otherwise be impossible. The key component of this system is a novel property reasoner that is able to translate low level image features into semantically relevant object properties. I use a combination of XML based languages such as SVG, RDF, and OWL in order to link to existing ontologies available on the web. My experiments demonstrate an efficient data collection framework and novel utilization of semantic data for image analysis and retrieval on datasets of people and landmarks collected from sources such as IMDB and Flickr. Ultimately, my thesis presents improvements to the state of the art in visual knowledge representation/acquisition and computer vision algorithms such as detection and segmentation toward the goal of enhanced semantic image retrieval

    Multi-feature Fusion Menggunakan Fitur Scale Invariant Feature Transform dan Local Extensive Binary Pattern untuk Pengenalan Pembuluh Darah pada Jari

    Get PDF
    Pengenalan pembuluh darah jari merupakan salah satu area dalam bidang biometrika. Sehingga tahap-tahap dalam proses pengenalan pembuluh darah jari memiliki kesamaan dengan proses pengenalan menggunakan biometrika lain yaitu meliputi pengumpulan citra, praproses, ekstraksi fitur, dan pencocokan. Tingkat keberhasilan dari tahap pencocokan ditentukan oleh pemilihan fitur pembuluh darah jari yang digunakan. Kondisi citra pembuluh darah yang rentan terhadap perubahan skala, rotasi maupun translasi menyebabkan kebutuhan akan fitur yang tahan terhadap kondisi tersebut menjadi hal yang penting. Fitur Scale Invariant Feature Transform (SIFT) adalah fitur yang telah cukup banyak digunakan untuk kasus pencocokan citra serta mampu tahan terhadap degradasi kondisi citra akibat perubahan skala, rotasi maupun translasi. Akan tetapi, fitur SIFT kurang memberikan hasil optimal jika diekstraksi dari citra dengan variasi tingkat keabuan seperti yang disebabkan oleh perbedaan intensitas pencahayaan. Fitur Local Extensive Binary Pattern (LEBP) merupakan fitur yang tahan terhadap variasi tingkat keabuan dengan informasi karakteristik lokal yang lebih kaya dan diskriminatif. Oleh karena itu digunakan teknik fusi untuk memperoleh informasi dari fitur SIFT dan fitur LEBP sehingga diperoleh fitur yang memiliki ketahanan terhadap degradasi kondisi citra akibat perubahan skala, rotasi, translasi, variasi tingkat keabuan seperti yang disebabkan oleh perbedaan intensitas pencahayaan. Penelitian ini mengusulkan multi-feature fusion menggunakan fitur SIFT dan LEBP untuk pengenalan pembuluh darah pada jari. Fitur hasil fusion diproses dengan metode Learning Vector Quantization (LVQ) untuk menentukan apakah citra pembuluh darah jari yang diuji dapat dikenali atau tidak. Dengan menggunakan multi-feature fusion diharapkan mampu representasi fitur yang dapat meningkatkan akurasi dari proses pengenalan pembuluh darah jari meskipun fitur diambil dari citra yang mengalami degradasi. Berdasarkan hasil uji coba diperoleh bahwa penggunaan multi-feature fusion dengan fitur SIFT dan LEBP memberikan hasil yang relatif lebih baik jika dibandingkan dengan hanya menggunakan fitur tunggal. Hal tersebut dapat dilihat dari peningkatan hasil kinerja sistem pada kondisi optimum dengan nilai akurasi sebesar 97,50%, TPR sebesar 0,9400 dan FPR sebesar 0,0128. ========== Finger vein recognition is one of the areas in the field of biometrics. The steps of finger vein recognition has in common with other biometric recognition process which include image acquisition, preprocessing, feature extraction and matching. The success rate of matching stage is determined by the selection of features. The conditions of finger vein images are susceptible to changes in scale, rotation and translation. The need for features that are resistant to these conditions becomes important. Scale invariant Feature Transform (SIFT) feature is a feature that has been quite widely used for image matching case and be able to withstand degradation due to changes in the condition of the image scale, rotation and translation. However, SIFT feature provide less optimal results when extracted from the image with gray level variations such as those caused by differences in lighting intensity. Local Extensive Binary Pattern (LEBP) feature is a feature that has resistance to gray level variations with richer and discriminatory local characteristics information. Therefore the fusion technique is used to obtain information from SIFT feature and LEBP feature. So that, the feature that has been produced can resist degradation problems such as changes in the condition of the image scale, rotation, translation, and gray level variations which caused by differences in lighting intensity. This study proposes a multi-feature fusion using SIFT and LEBP features for finger vein recognition. This fusion feature will be processed by Learning Vector Quantization (LVQ) method to determine whether the testing image can be x recognized or not. By using a multi-feature fusion, it is expected to get representations of features that can improve the accuracy of the finger vein recognition although the feature is taken from the degraded image. Based on experiment results, finger vein recognition that use multi-feature fusion using integration feature of scale invariant feature transform and local extensive binary pattern provide a better result than only use a single feature. It can be seen from the increase of performance system in optimum condition. The accuracy value can achieve 97.50%, TPR at 0.9400 and FPR at 0.0128

    Superpixels: An Evaluation of the State-of-the-Art

    Full text link
    Superpixels group perceptually similar pixels to create visually meaningful entities while heavily reducing the number of primitives for subsequent processing steps. As of these properties, superpixel algorithms have received much attention since their naming in 2003. By today, publicly available superpixel algorithms have turned into standard tools in low-level vision. As such, and due to their quick adoption in a wide range of applications, appropriate benchmarks are crucial for algorithm selection and comparison. Until now, the rapidly growing number of algorithms as well as varying experimental setups hindered the development of a unifying benchmark. We present a comprehensive evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark focussing on fair comparison and designed to provide new insights relevant for applications. To this end, we explicitly discuss parameter optimization and the importance of strictly enforcing connectivity. Furthermore, by extending well-known metrics, we are able to summarize algorithm performance independent of the number of generated superpixels, thereby overcoming a major limitation of available benchmarks. Furthermore, we discuss runtime, robustness against noise, blur and affine transformations, implementation details as well as aspects of visual quality. Finally, we present an overall ranking of superpixel algorithms which redefines the state-of-the-art and enables researchers to easily select appropriate algorithms and the corresponding implementations which themselves are made publicly available as part of our benchmark at davidstutz.de/projects/superpixel-benchmark/

    Medical Image Retrieval using Bag of Meaningful Visual Words: Unsupervised visual vocabulary pruning with PLSA

    Get PDF
    Content--based medical image retrieval has been proposed as a technique that allows not only for easy access to images from the relevant literature and electronic health records but also for training physicians, for research and clinical decision support. The bag-of-visual-words approach is a widely used technique that tries to shorten the semantic gap by learning meaningful features from the dataset and describing documents and images in terms of the histogram of these features. Visual vocabularies are often redundant, over--complete and noisy. Larger than required vocabularies lead to high--dimensional feature spaces, which present important disadvantages with the curse of dimensionality and computational cost being the most obvious ones. In this work a visual vocabulary pruning technique is presented. It enormously reduces the amount of required words to describe a medical image dataset with no significant effect on the accuracy. Results show that a reduction of up to 90% can be achieved without impact on the system performance. Obtaining a more compact representation of a document enables multimodal description as well as using classifiers requiring low--dimensional representations

    Multi-feature Fusion Using SIFT and LEBP for Finger Vein Recognition

    Get PDF
    In this paper, multi-feature fusion using Scale Invariant Feature Transform (SIFT) and Local Extensive Binary Pattern (LEBP) was proposed to obtain a feature that could resist degradation problems such as scaling, rotation, translation and varying illumination conditions. SIFT feature had a capability to withstand degradation due to changes in the condition of the image scale, rotation and translation. Meanwhile, LEBP feature had resistance to gray level variations with richer and discriminatory local characteristics information. Therefore the fusion technique is used to collect important information from SIFT and LEBP feature.The resulting feature of multi-feature fusion using SIFT and LEBP feature would be processed by Learning Vector Quantization (LVQ) method to determine whether the testing image could be recognized or not. The accuracy value could achieve 97.50%, TPR at 0.9400 and FPR at 0.0128 in optimum condition.  That was a better result than only use SIFT or LEBP feature

    A Comparative Study of Finger Vein Recognition by Using Learning Vector Quantization

    Full text link
    ¾ This paper presents a comparative study of finger vein recognition using various features with Learning Vector Quantization (LVQ) as a classification method. For the purpose of this study, two main features are employed: Scale Invariant Feature Transform (SIFT) and Local Extensive Binary Pattern (LEBP). The other features that formed LEBP features: Local Multilayer Binary Pattern (LmBP) and Local Directional Binary Pattern (LdBP) are also employed. The type of images are also become the base of comparison. The SIFT features will be extracted from two types of images which are grayscale and binary images. The feature that have been extracted become the input for recognition stage. In recognition stage, LVQ classifier is used. LVQ will classify the images into two class which are the recognizable images and non recognizable images. The accuracy, false positive rate (FPR), and true positive rate (TPR) value are used to evaluate the performance of finger vein recognition. The performance result of finger vein recognition becomes the main study for comparison stage. From the experiments result, it can be found which feature is the best for finger vein reconition using LVQ. The performance of finger vein recognition that use SIFT feature from binary images give a slightly better result than uisng LmBP, LdBP, or LEBP feature. The accuracy value could achieve 97,45%, TPR at 0,9000 and FPR at 0,0129

    A comparative study of finger vein recognition by using Learning Vector Quantization

    Get PDF
    Abstract¾ This paper presents a comparative study of finger vein recognition using various features with Learning Vector Quantization (LVQ) as a classification method. For the purpose of this study, two main features are employed: Scale Invariant Feature Transform (SIFT) and Local Extensive Binary Pattern (LEBP). The other features that formed LEBP features: Local Multilayer Binary Pattern (LmBP) and Local Directional Binary Pattern (LdBP) are also employed. The type of images are also become the base of comparison. The SIFT features will be extracted from two types of images which are grayscale and binary images. The feature that have been extracted become the input for recognition stage. In recognition stage, LVQ classifier is used. LVQ will classify the images into two class which are the recognizable images and non recognizable images. The accuracy, false positive rate (FPR), and true positive rate (TPR) value are used to evaluate the performance of finger vein recognition. The performance result of finger vein recognition becomes the main study for comparison stage. From the experiments result, it can be found which feature is the best for finger vein reconition using LVQ. The performance of finger vein recognition that use SIFT feature from binary images give a slightly better result than uisng LmBP, LdBP, or LEBP feature. The accuracy value could achieve 97,45%, TPR at 0,9000 and FPR at 0,0129. 
    • …
    corecore