Search CORE

13 research outputs found

Food Recognition and Volume Estimation in a Dietary Assessment System

Author: Rahman Md. Hafizur
Publication venue: UNSW, Sydney
Publication date: 01/01/2013
Field of study

Recently obesity has become an epidemic and one of the most serious worldwide public health concerns of the 21st century. Obesity diminishes the average life expectancy and there is now convincing evidence that poor diet, in combination with physical inactivity are key determinants of an individual s risk of developing chronic diseases such as cancer, cardiovascular disease or diabetes. Assessing what people eat is fundamental to establishing the link between diet and disease. Food records are considered the best approach for assessing energy intake. However, this method requires literate and highly motivated subjects. This is a particular problem for adolescents and young adults who are the least likely to undertake food records. The ready access of the majority of the population to mobile phones (with integrated camera, improved memory capacity, network connectivity and faster processing capability) has opened up new opportunities for dietary assessment. The dietary information extracted from dietary assessment provide valuable insights into the cause of diseases that greatly helps practicing dietitians and researchers to develop subsequent approaches for mounting intervention programs for prevention. In such systems, the camera in the mobile phone is used for capturing images of food consumed and these images are then processed to automatically estimate the nutritional content of the food. However, food objects are deformable objects that exhibit variations in appearance, shape, texture and color so the food classification and volume estimation in these systems suffer from lower accuracy. The improvement of the food recognition accuracy and volume estimation accuracy are challenging tasks. This thesis presents new techniques for food classification and food volume estimation. For food recognition, emphasis was given to texture features. The existing food recognition techniques assume that the food images will be viewed at similar scales and from the same viewpoints. However, this assumption fails in practical applications, because it is difficult to ensure that a user in a dietary assessment system will put his/her camera at the same scale and orientation to capture food images as that of the target food images in the database. A new scale and rotation invariant feature generation approach that applies Gabor filter banks is proposed. To obtain scale and rotation invariance, the proposed approach identifies the dominant orientation of the filtered coefficient and applies a circular shifting operation to place this value at the first scale of dominant direction. The advantages of this technique are it does not require the scale factor to be known in advance and it is scale/and rotation invariant separately and concurrently. This approach is modified to achieve improved accuracy by applying a Gaussian window along the scale dimension which reduces the impact of high and low frequencies of the filter outputs enabling better matching between the same classes. Besides automatic classification, semi automatic classification and group classification are also considered to have an idea about the improvement. To estimate the volume of a food item, a stereo pair is used to recover the structure as a 3D point cloud. A slice based volume estimation approach is proposed that converts the 3D point cloud to a series of 2D slices. The proposed approach eliminates the problem of knowing the distance between two cameras with the help of disparities and depth information from a fiducial marker. The experimental results show that the proposed approach can provide an accurate estimate of food volume

UNSWorks

Modélisation stochastique pour l'analyse d'images texturées (approches Bayésiennes pour la caractérisation dans le domaine des transformées)

Author: BERTHOUMIEU Yannick
LASMAR Nour-Eddine
Publication venue
Publication date: 01/01/2012
Field of study

Le travail présenté dans cette thèse s inscrit dans le cadre de la modélisation d images texturées à l aide des représentations multi-échelles et multi-orientations. Partant des résultats d études en neurosciences assimilant le mécanisme de la perception humaine à un schéma sélectif spatio-fréquentiel, nous proposons de caractériser les images texturées par des modèles probabilistes associés aux coefficients des sous-bandes. Nos contributions dans ce contexte concernent dans un premier temps la proposition de différents modèles probabilistes permettant de prendre en compte le caractère leptokurtique ainsi que l éventuelle asymétrie des distributions marginales associées à un contenu texturée. Premièrement, afin de modéliser analytiquement les statistiques marginales des sous-bandes, nous introduisons le modèle Gaussien généralisé asymétrique. Deuxièmement, nous proposons deux familles de modèles multivariés afin de prendre en compte les dépendances entre coefficients des sous-bandes. La première famille regroupe les processus à invariance sphérique pour laquelle nous montrons qu il est pertinent d associer une distribution caractéristique de type Weibull. Concernant la seconde famille, il s agit des lois multivariées à copules. Après détermination de la copule caractérisant la structure de la dépendance adaptée à la texture, nous proposons une extension multivariée de la distribution Gaussienne généralisée asymétrique à l aide de la copule Gaussienne. L ensemble des modèles proposés est comparé quantitativement en terme de qualité d ajustement à l aide de tests statistiques d adéquation dans un cadre univarié et multivarié. Enfin, une dernière partie de notre étude concerne la validation expérimentale des performances de nos modèles à travers une application de recherche d images par le contenu textural. Pour ce faire, nous dérivons des expressions analytiques de métriques probabilistes mesurant la similarité entre les modèles introduits, ce qui constitue selon nous une troisième contribution de ce travail. Finalement, une étude comparative est menée visant à confronter les modèles probabilistes proposés à ceux de l état de l art.In this thesis we study the statistical modeling of textured images using multi-scale and multi-orientation representations. Based on the results of studies in neuroscience assimilating the human perception mechanism to a selective spatial frequency scheme, we propose to characterize textures by probabilistic models of subband coefficients.Our contributions in this context consist firstly in the proposition of probabilistic models taking into account the leptokurtic nature and the asymmetry of the marginal distributions associated with a textured content. First, to model analytically the marginal statistics of subbands, we introduce the asymmetric generalized Gaussian model. Second, we propose two families of multivariate models to take into account the dependencies between subbands coefficients. The first family includes the spherically invariant processes that we characterize using Weibull distribution. The second family is this of copula based multivariate models. After determination of the copula characterizing the dependence structure adapted to the texture, we propose a multivariate extension of the asymmetric generalized Gaussian distribution using Gaussian copula. All proposed models are compared quantitatively using both univariate and multivariate statistical goodness of fit tests. Finally, the last part of our study concerns the experimental validation of the performance of proposed models through texture based image retrieval. To do this, we derive closed-form metrics measuring the similarity between probabilistic models introduced, which we believe is the third contribution of this work. A comparative study is conducted to compare the proposed probabilistic models to those of the state-of-the-art.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

OpenGrey Repository

Radon Projections as Image Descriptors for Content-Based Retrieval of Medical Images

Author: Sriram Aditya
Publication venue: 'University of Waterloo'
Publication date: 16/04/2018
Field of study

Clinical analysis and medical diagnosis of diverse diseases adopt medical imaging techniques to empower specialists to perform their tasks by visualizing internal body organs and tissues for classifying and treating diseases at an early stage. Content-Based Image Retrieval (CBIR) systems are a set of computer vision techniques to retrieve similar images from a large database based on proper image representations. Particularly in radiology and histopathology, CBIR is a promising approach to effectively screen, understand, and retrieve images with similar level of semantic descriptions from a database of previously diagnosed cases to provide physicians with reliable assistance for diagnosis, treatment planning and research. Over the past decade, the development of CBIR systems in medical imaging has expedited due to the increase in digitized modalities, an increase in computational efficiency (e.g., availability of GPUs), and progress in algorithm development in computer vision and artificial intelligence. Hence, medical specialists may use CBIR prototypes to query similar cases from a large image database based solely on the image content (and no text). Understanding the semantics of an image requires an expressive descriptor that has the ability to capture and to represent unique and invariant features of an image. Radon transform, one of the oldest techniques widely used in medical imaging, can capture the shape of organs in form of a one-dimensional histogram by projecting parallel rays through a two-dimensional object of concern at a specific angle. In this work, the Radon transform is re-designed to (i) extract features and (ii) generate a descriptor for content-based retrieval of medical images. Radon transform is applied to feed a deep neural network instead of raw images in order to improve the generalization of the network. Specifically, the framework is composed of providing Radon projections of an image to a deep autoencoder, from which the deepest layer is isolated and fed into a multi-layer perceptron for classification. This approach enables the network to (a) train much faster as the Radon projections are computationally inexpensive compared to raw input images, and (b) perform more accurately as Radon projections can make more pronounced and salient features to the network compared to raw images. This framework is validated on a publicly available radiography data set called "Image Retrieval in Medical Applications" (IRMA), consisting of 12,677 train and 1,733 test images, for which an classification accuracy of approximately 82% is achieved, outperforming all autoencoder strategies reported on the Image Retrieval in Medical Applications (IRMA) dataset. The classification accuracy is calculated by dividing the total IRMA error, a calculation outlined by the authors of the data set, with the total number of test images. Finally, a compact handcrafted image descriptor based on Radon transform was designed in this work that is called "Forming Local Intersections of Projections" (FLIP). The FLIP descriptor has been designed, through numerous experiments, for representing histopathology images. The FLIP descriptor is based on Radon transform wherein parallel projections are applied in a local 3x3 neighborhoods with 2 pixel overlap of gray-level images (staining of histopathology images is ignored). Using four equidistant projection directions in each window, the characteristics of the neighborhood is quantified by taking an element-wise minimum between each adjacent projection in each window. Thereafter, the FLIP histogram (descriptor) for each image is constructed. A multi-resolution FLIP (mFLIP) scheme is also proposed which is observed to outperform many state-of-the-art methods, among others deep features, when applied on the histopathology data set KIMIA Path24. Experiments show a total classification accuracy of approximately 72% using SVM classification, which surpasses the current benchmark of approximately 66% on the KIMIA Path24 data set

University of Waterloo's Institutional Repository

Content-based image retrieval of museum images

Author: Ahmad Fauzi Mohammad Faizal
Publication venue
Publication date: 01/01/2004
Field of study

Content-based image retrieval (CBIR) is becoming more and more important with the advance of multimedia and imaging technology. Among many retrieval features associated with CBIR, texture retrieval is one of the most difficult. This is mainly because no satisfactory quantitative definition of texture exists at this time, and also because of the complex nature of the texture itself. Another difficult problem in CBIR is query by low-quality images, which means attempts to retrieve images using a poor quality image as a query. Not many content-based retrieval systems have addressed the problem of query by low-quality images. Wavelet analysis is a relatively new and promising tool for signal and image analysis. Its time-scale representation provides both spatial and frequency information, thus giving extra information compared to other image representation schemes. This research aims to address some of the problems of query by texture and query by low quality images by exploiting all the advantages that wavelet analysis has to offer, particularly in the context of museum image collections. A novel query by low-quality images algorithm is presented as a solution to the problem of poor retrieval performance using conventional methods. In the query by texture problem, this thesis provides a comprehensive evaluation on wavelet-based texture method as well as comparison with other techniques. A novel automatic texture segmentation algorithm and an improved block oriented decomposition is proposed for use in query by texture. Finally all the proposed techniques are integrated in a content-based image retrieval application for museum image collections

Southampton (e-Prints Soton)

OpenGrey Repository

Pattern Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

Directory of Open Access Books (DOAB)

De-Duplication of Person's Identity Using Multi-Modal Biometrics

Author: N Pattabhi Ramaiah
Publication venue
Publication date: 01/01/2015
Field of study

The objective of this work is to explore approaches to create unique identities by the de-duplication process using multi-modal biometrics. Various government sectors in the world provide different services and welfare schemes for the beneffit of the people in the society using an identity number. A unique identity (UID) number assigned for every person would obviate the need for a person to produce multiple documentary proofs of his/her identity for availing any government/private services. In the process of creating unique identity of a person, there is a possibility of duplicate identities as the same person might want to get multiple identities in order to get extra beneffits from the Government. These duplicate identities can be eliminated by the de-duplication process using multi-modal biometrics, namely, iris, ngerprint, face and signature. De-duplication is the process of removing instances of multiple enrollments of the same person using the person's biometric data. As the number of people enrolledinto the biometric system runs into billions, the time complexity increases in the de duplication process. In this thesis, three different case studies are presented to address the performance issues of de-duplication process in order to create unique identity of a person

Research Archive of Indian Institute of Technology Hyderabad

Entropy in Image Analysis II

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

Directory of Open Access Books (DOAB)

On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

Author: Antonisse Joey
Azzopardi George
Bennabhaktula Swaroop
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/10/2021
Field of study

Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Two and three dimensional segmentation of multimodal imagery

Author: Vantaram Sreenath Rao
Publication venue: RIT Scholar Works
Publication date: 01/10/2012
Field of study

The role of segmentation in the realms of image understanding/analysis, computer vision, pattern recognition, remote sensing and medical imaging in recent years has been significantly augmented due to accelerated scientific advances made in the acquisition of image data. This low-level analysis protocol is critical to numerous applications, with the primary goal of expediting and improving the effectiveness of subsequent high-level operations by providing a condensed and pertinent representation of image information. In this research, we propose a novel unsupervised segmentation framework for facilitating meaningful segregation of 2-D/3-D image data across multiple modalities (color, remote-sensing and biomedical imaging) into non-overlapping partitions using several spatial-spectral attributes. Initially, our framework exploits the information obtained from detecting edges inherent in the data. To this effect, by using a vector gradient detection technique, pixels without edges are grouped and individually labeled to partition some initial portion of the input image content. Pixels that contain higher gradient densities are included by the dynamic generation of segments as the algorithm progresses to generate an initial region map. Subsequently, texture modeling is performed and the obtained gradient, texture and intensity information along with the aforementioned initial partition map are used to perform a multivariate refinement procedure, to fuse groups with similar characteristics yielding the final output segmentation. Experimental results obtained in comparison to published/state-of the-art segmentation techniques for color as well as multi/hyperspectral imagery, demonstrate the advantages of the proposed method. Furthermore, for the purpose of achieving improved computational efficiency we propose an extension of the aforestated methodology in a multi-resolution framework, demonstrated on color images. Finally, this research also encompasses a 3-D extension of the aforementioned algorithm demonstrated on medical (Magnetic Resonance Imaging / Computed Tomography) volumes

RIT Scholar Works