Search CORE

1,285 research outputs found

Investigation on advanced image search techniques

Author: Verma Abhishek
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2011
Field of study

Content-based image search for retrieval of images based on the similarity in their visual contents, such as color, texture, and shape, to a query image is an active research area due to its broad applications. Color, for example, provides powerful information for image search and classification. This dissertation investigates advanced image search techniques and presents new color descriptors for image search and classification and robust image enhancement and segmentation methods for iris recognition. First, several new color descriptors have been developed for color image search. Specifically, a new oRGB-SIFT descriptor, which integrates the oRGB color space and the Scale-Invariant Feature Transform (SIFT), is proposed for image search and classification. The oRGB-SIFT descriptor is further integrated with other color SIFT features to produce the novel Color SIFT Fusion (CSF), the Color Grayscale SIFT Fusion (CGSF), and the CGSF+PHOG descriptors for image category search with applications to biometrics. Image classification is implemented using a novel EFM-KNN classifier, which combines the Enhanced Fisher Model (EFM) and the K Nearest Neighbor (KNN) decision rule. Experimental results on four large scale, grand challenge datasets have shown that the proposed oRGB-SIFT descriptor improves recognition performance upon other color SIFT descriptors, and the CSF, the CGSF, and the CGSF+PHOG descriptors perform better than the other color SIFT descriptors. The fusion of both Color SIFT descriptors (CSF) and Color Grayscale SIFT descriptor (CGSF) shows significant improvement in the classification performance, which indicates that various color-SIFT descriptors and grayscale-SIFT descriptor are not redundant for image search. Second, four novel color Local Binary Pattern (LBP) descriptors are presented for scene image and image texture classification. Specifically, the oRGB-LBP descriptor is derived in the oRGB color space. The other three color LBP descriptors, namely, the Color LBP Fusion (CLF), the Color Grayscale LBP Fusion (CGLF), and the CGLF+PHOG descriptors, are obtained by integrating the oRGB-LBP descriptor with some additional image features. Experimental results on three large scale, grand challenge datasets have shown that the proposed descriptors can improve scene image and image texture classification performance. Finally, a new iris recognition method based on a robust iris segmentation approach is presented for improving iris recognition performance. The proposed robust iris segmentation approach applies power-law transformations for more accurate detection of the pupil region, which significantly reduces the candidate limbic boundary search space for increasing detection accuracy and efficiency. As the limbic circle, which has a center within a close range of the pupil center, is selectively detected, the eyelid detection approach leads to improved iris recognition performance. Experiments using the Iris Challenge Evaluation (ICE) database show the effectiveness of the proposed method

Digital Commons @ New Jersey Institute of Technology (NJIT)

Investigation of new feature descriptors for image search and classification

Author: Sinha Atreyee
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2014
Field of study

Content-based image search, classification and retrieval is an active and important research area due to its broad applications as well as the complexity of the problem. Understanding the semantics and contents of images for recognition remains one of the most difficult and prevailing problems in the machine intelligence and computer vision community. With large variations in size, pose, illumination and occlusions, image classification is a very challenging task. A good classification framework should address the key issues of discriminatory feature extraction as well as efficient and accurate classification. Towards that end, this dissertation focuses on exploring new image descriptors by incorporating cues from the human visual system, and integrating local, texture, shape as well as color information to construct robust and effective feature representations for advancing content-based image search and classification. Based on the Gabor wavelet transformation, whose kernels are similar to the 2D receptive field profiles of the mammalian cortical simple cells, a series of new image descriptors is developed. Specifically, first, a new color Gabor-HOG (GHOG) descriptor is introduced by concatenating the Histograms of Oriented Gradients (HOG) of the component images produced by applying Gabor filters in multiple scales and orientations to encode shape information. Second, the GHOG descriptor is analyzed in six different color spaces and grayscale to propose different color GHOG descriptors, which are further combined to present a new Fused Color GHOG (FC-GHOG) descriptor. Third, a novel GaborPHOG (GPHOG) descriptor is proposed which improves upon the Pyramid Histograms of Oriented Gradients (PHOG) descriptor, and subsequently a new FC-GPHOG descriptor is constructed by combining the multiple color GPHOG descriptors and employing the Principal Component Analysis (PCA). Next, the Gabor-LBP (GLBP) is derived by accumulating the Local Binary Patterns (LBP) histograms of the local Gabor filtered images to encode texture and local information of an image. Furthermore, a novel Gabor-LBPPHOG (GLP) image descriptor is proposed which integrates the GLBP and the GPHOG descriptors as a feature set and an innovative Fused Color Gabor-LBP-PHOG (FC-GLP) is constructed by fusing the GLP from multiple color spaces. Subsequently, The GLBP and the GHOG descriptors are then combined to produce the Gabor-LBP-HOG (GLH) feature vector which performs well on different object and scene image categories. The six color GLH vectors are further concatenated to form the Fused Color GLH (FC-GLH) descriptor. Finally, the Wigner based Local Binary Patterns (WLBP) descriptor is proposed that combines multi-neighborhood LBP, Pseudo-Wigner distribution of images and the popular bag of words model to effectively classify scene images. To assess the feasibility of the proposed new image descriptors, two classification methods are used: one method applies the PCA and the Enhanced Fisher Model (EFM) for feature extraction and the nearest neighbor rule for classification, while the other method employs the Support Vector Machine (SVM). The classification performance of the proposed descriptors is tested on several publicly available popular image datasets. The experimental results show that the proposed new image descriptors achieve image search and classification results better than or at par with other popular image descriptors, such as the Scale Invariant Feature Transform (SIFT), the Pyramid Histograms of visual Words (PHOW), the Pyramid Histograms of Oriented Gradients (PHOG), the Spatial Envelope (SE), the Color SIFT four Concentric Circles (C4CC), the Object Bank (OB), the Context Aware Topic Model (CA-TM), the Hierarchical Matching Pursuit (HMP), the Kernel Spatial Pyramid Matching (KSPM), the SIFT Sparse-coded Spatial Pyramid Matching (Sc-SPM), the Kernel Codebook (KC) and the LBP

Digital Commons @ New Jersey Institute of Technology (NJIT)

Novel color and local image descriptors for content-based image search

Author: Banerji Sugata
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2013
Field of study

Content-based image classification, search and retrieval is a rapidly-expanding research area. With the advent of inexpensive digital cameras, cheap data storage, fast computing speeds and ever-increasing data transfer rates, millions of images are stored and shared over the Internet every day. This necessitates the development of systems that can classify these images into various categories without human intervention and on being presented a query image, can identify its contents in order to retrieve similar images. Towards that end, this dissertation focuses on investigating novel image descriptors based on texture, shape, color, and local information for advancing content-based image search. Specifically, first, a new color multi-mask Local Binary Patterns (mLBP) descriptor is presented to improve upon the traditional Local Binary Patterns (LBP) texture descriptor for better image classification performance. Second, the mLBP descriptors from different color spaces are fused to form the Color LBP Fusion (CLF) and Color Grayscale LBP Fusion (CGLF) descriptors that further improve image classification performance. Third, a new HaarHOG descriptor, which integrates the Haar wavelet transform and the Histograms of Oriented Gradients (HOG), is presented for extracting both shape and local information for image classification. Next, a novel three Dimensional Local Binary Patterns (3D-LBP) descriptor is proposed for color images by encoding both color and texture information for image search. Furthermore, the novel 3DLH and 3DLH-fusion descriptors are proposed, which combine the HaarHOG and the 3D-LBP descriptors by means of Principal Component Analysis (PCA) and are able to improve upon the individual HaarHOG and 3D-LBP descriptors for image search. Subsequently, the innovative H-descriptor, and the H-fusion descriptor are presented that improve upon the 3DLH descriptor. Finally, the innovative Bag of Words-LBP (BoWL) descriptor is introduced that combines the idea of LBP with a bag-of-words representation to further improve image classification performance. To assess the feasibility of the proposed new image descriptors, two classification frameworks are used. In one, the PCA and the Enhanced Fisher Model (EFM) are applied for feature extraction and the nearest neighbor classification rule for classification. In the other, a Support Vector Machine (SVM) is used for classification. The classification performance is tested on several widely used and publicly available image datasets. The experimental results show that the proposed new image descriptors achieve an image classification performance better than or comparable to other popular image descriptors, such as the Scale Invariant Feature Transform (SIFT), the Pyramid Histograms of visual Words (PHOW), the Pyramid Histograms of Oriented Gradients (PHOG), the Spatial Envelope (SE), the Color SIFT four Concentric Circles (C4CC), the Object Bank (OB), the Hierarchical Matching Pursuit (HMP), the Kernel Spatial Pyramid Matching (KSPM), the SIFT Sparse-coded Spatial Pyramid Matching (ScSPM), the Kernel Codebook (KC) and the LBP

Digital Commons @ New Jersey Institute of Technology (NJIT)

Face recognition using multiple features in different color spaces

Author: Liu Zhiming
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2011
Field of study

Face recognition as a particular problem of pattern recognition has been attracting substantial attention from researchers in computer vision, pattern recognition, and machine learning. The recent Face Recognition Grand Challenge (FRGC) program reveals that uncontrolled illumination conditions pose grand challenges to face recognition performance. Most of the existing face recognition methods use gray-scale face images, which have been shown insufficient to tackle these challenges. To overcome this challenging problem in face recognition, this dissertation applies multiple features derived from the color images instead of the intensity images only. First, this dissertation presents two face recognition methods, which operate in different color spaces, using frequency features by means of Discrete Fourier Transform (DFT) and spatial features by means of Local Binary Patterns (LBP), respectively. The DFT frequency domain consists of the real part, the imaginary part, the magnitude, and the phase components, which provide the different interpretations of the input face images. The advantage of LBP in face recognition is attributed to its robustness in terms of intensity-level monotonic transformation, as well as its operation in the various scale image spaces. By fusing the frequency components or the multi-resolution LBP histograms, the complementary feature sets can be generated to enhance the capability of facial texture description. This dissertation thus uses the fused DFT and LBP features in two hybrid color spaces, the RIQ and the VIQ color spaces, respectively, for improving face recognition performance. Second, a method that extracts multiple features in the CID color space is presented for face recognition. As different color component images in the CID color space display different characteristics, three different image encoding methods, namely, the patch-based Gabor image representation, the multi-resolution LBP feature fusion, and the DCT-based multiple face encodings, are presented to effectively extract features from the component images for enhancing pattern recognition performance. To further improve classification performance, the similarity scores due to the three color component images are fused for the final decision making. Finally, a novel image representation is also discussed in this dissertation. Unlike a traditional intensity image that is directly derived from a linear combination of the R, G, and B color components, the novel image representation adapted to class separability is generated through a PCA plus FLD learning framework from the hybrid color space instead of the RGB color space. Based upon the novel image representation, a multiple feature fusion method is proposed to address the problem of face recognition under the severe illumination conditions. The aforementioned methods have been evaluated using two large-scale databases, namely, the Face Recognition Grand Challenge (FRGC) version 2 database and the FERET face database. Experimental results have shown that the proposed methods improve face recognition performance upon the traditional methods using the intensity images by large margins and outperform some state-of-the-art methods

Digital Commons @ New Jersey Institute of Technology (NJIT)

State of the Art in Face Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state

Directory of Open Access Books (DOAB)

DATA FUSION APPROACHES IN SPECTROSCOPIC CHARACTERIZATION AND CLASSIFICATION OF PDO WINE VINEGARS

Author: Amigo J. M.
Callej\uf3n R. M.
Cocchi M.
Rios-Reina R.
Savorani F.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Spain is one of the major producers of high-quality wine vinegars having three protected designations of origin (a.k.a. PDOs): "Vinagre de Jerez", "Vinagre de Condado de Huelva" and "Vinagre de Montilla-Moriles". Their high prices due to their high quality and their high production costs explain the need for developing an adequate quality control technique and the interest in extensive characterization in order to capture the identity of each denomination. In this framework, methodologies based on non-targeted techniques, such as spectroscopies, are becoming popular in food authentication. Thus, for improving vinegar quality assessment, fusion of data blocks obtained from the same samples but different analytical techniques could be a good strategy, since the quantity and quality of sample knowledge could be enhanced providing new insights into the differentiation of vinegars. Therefore, the aim of this manuscript is the development of a multi-platform methodology and a model able to classify the Spanish wine vinegar PDOs. Sixty-five PDO wine vinegars were analyzed by four spectroscopic techniques: Fourier transform mid-infrared spectroscopy (MIR), near infrared spectroscopy (NIR), multidimensional fluorescence spectroscopy (EEM) and proton nuclear magnetic resonance (1H-NMR). Two different data fusion strategies were evaluated: Mid-level data fusion with different preprocessing, and Common Component and Specific Weights analysis multiblock method. Exploratory and classification analysis on the data from individual techniques were also performed and compared with data fusion models. The data fusion models improved the classification, providing a more efficient differentiation, than the models based on single methods, and supporting the approach to combine these methods to achieve synergies for an optimized PDO differentiation

Copenhagen University Research Information System

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

idUS. Depósito de Investigación Universidad de Sevilla

Effective Uni-Modal to Multi-Modal Crowd Estimation based on Deep Neural Networks

Author: SAJID USMAN
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2021
Field of study

Crowd estimation is a vital component of crowd analysis. It finds many applications in real-worldscenarios, e.g. huge gatherings management like Hajj, sporting and musical events, or political rallies. Automated crowd counting facilitates better and effective management of such events and consequently prevents any undesired situation. This is a very challenging problem in practice since there exists a significant difference in the crowd number in and across different images, varying image resolution, large perspective, severe occlusions, and dense crowd-like cluttered background regions. Current approaches do not handle huge crowd diversity well and thus perform poorly in cases ranging from extreme low to high crowd-density, thus, yielding huge crowd underestimation or overestimation. Also, manual crowd counting proves to be infeasible due to very slow and inaccurate results. To address these major crowd counting issues and challenges, we investigate two different types of input data: uni-modal (image) and multi-modal (image and audio). In the uni-modal setting, we propose and analyze four novel end-to-end crowd counting networks, ranging from multi-scale fusion-based models to uni-scale one-pass and two-pass multitask networks. The multi-scale networks employ the attention mechanism to enhance the model efficacy. On the other hand, the uni-scale models are well-equipped with novel and simple-yet effective patch re-scaling module (PRM) that functions identical but is more lightweight than multi-scale approaches. Experimental evaluation demonstrates that the proposed networks outperform the state-of-the-art in majority cases on four different benchmark datasets with up to 12.6% improvement for the RMSE evaluation metric. The better cross-dataset performance also validates the better generalization ability of our schemes. For the multi-modal input, effective feature-extraction (FE) and strong information fusion between two modalities remain a big challenge. Thus, the multi-modal novel network design focuses on investigating different features fusion techniques amid improving the FE. Based on the comprehensive experimental evaluation, the proposed multi-modal network increases the performance under all standard evaluation criteria with up to 33.8% improvement in comparison to the state-of-the-art. The application of multi-scale uni-modal attention networks also proves more effective in other deep learning domains, as demonstrated successfully on seven different scene-text recognition task datasets with better performance

KU ScholarWorks