64 research outputs found

    Methods for Photoacoustic Image Reconstruction Exploiting Properties of Curvelet Frame

    Get PDF
    Curvelet frame is of special significance for photoacoustic tomography (PAT) due to its sparsifying and microlocalisation properties. In this PhD project, we explore the methods for image reconstruction in PAT with flat sensor geometry using Curvelet properties. This thesis makes five distinct contributions: (i) We investigate formulation of the forward, adjoint and inverse operators for PAT in Fourier domain. We derive a one-to-one map between wavefront directions in image and data spaces in PAT. Combining the Fourier operators with the wavefront map allows us to create the appropriate PAT operators for solving limited-view problems due to limited angular sensor sensitivity. (ii) We devise a concept of wedge restricted Curvelet transform, a modification of standard Curvelet transform, which allows us to formulate a tight frame of wedge restricted Curvelets on the range of the PAT forward operator for PAT data representation. We consider details specific to PAT data such as symmetries, time oversampling and their consequences. We further adapt the wedge restricted Curvelet to decompose the wavefronts into visible and invisible parts in the data domain as well as in the image domain. (iii) We formulate a two step approach based on the recovery of the complete volume of the photoacoustic data from the sub-sampled data followed by the acoustic inversion, and a one step approach where the photoacoustic image is directly recovered from the subsampled data. The wedge restricted Curvelet is used as the sparse representation of the photoacoustic data in the two step approach. (iv) We discuss a joint variational approach that incorporates Curvelet sparsity in photoacoustic image domain and spatio-temporal regularization via optical flow constraint to achieve improved results for dynamic PAT reconstruction. (v) We consider the limited-view problem due to limited angular sensitivity of the sensor (see (i) for the formulation of the corresponding fast operators in Fourier domain). We propose complementary information learning approach based on splitting the problem into visible and invisible singularities. We perform a sparse reconstruction of the visible Curvelet coefficients using compressed sensing techniques and propose a tailored deep neural network architecture to recover the invisible coefficients

    Information selection and fusion in vision systems

    Get PDF
    Handling the enormous amounts of data produced by data-intensive imaging systems, such as multi-camera surveillance systems and microscopes, is technically challenging. While image and video compression help to manage the data volumes, they do not address the basic problem of information overflow. In this PhD we tackle the problem in a more drastic way. We select information of interest to a specific vision task, and discard the rest. We also combine data from different sources into a single output product, which presents the information of interest to end users in a suitable, summarized format. We treat two types of vision systems. The first type is conventional light microscopes. During this PhD, we have exploited for the first time the potential of the curvelet transform for image fusion for depth-of-field extension, allowing us to combine the advantages of multi-resolution image analysis for image fusion with increased directional sensitivity. As a result, the proposed technique clearly outperforms state-of-the-art methods, both on real microscopy data and on artificially generated images. The second type is camera networks with overlapping fields of view. To enable joint processing in such networks, inter-camera communication is essential. Because of infrastructure costs, power consumption for wireless transmission, etc., transmitting high-bandwidth video streams between cameras should be avoided. Fortunately, recently designed 'smart cameras', which have on-board processing and communication hardware, allow distributing the required image processing over the cameras. This permits compactly representing useful information from each camera. We focus on representing information for people localization and observation, which are important tools for statistical analysis of room usage, quick localization of people in case of building fires, etc. To further save bandwidth, we select which cameras should be involved in a vision task and transmit observations only from the selected cameras. We provide an information-theoretically founded framework for general purpose camera selection based on the Dempster-Shafer theory of evidence. Applied to tracking, it allows tracking people using a dynamic selection of as little as three cameras with the same accuracy as when using up to ten cameras

    U-ASD Net: supervised crowd counting based on semantic segmentation and adaptive scenario discovery

    Get PDF
    Crowd counting considers one of the most significant and challenging issues in computer vision and deep learning communities, whose applications are being utilized for various tasks. While this issue is well studied, it remains an open challenge to manage perspective distortions and scale variations. How well these problems are resolved has a huge impact on predicting a high-quality crowd density map. In this study, a hybrid and modified deep neural network (U-ASD Net), based on U-Net and adaptive scenario discovery (ASD), is proposed to get precise and effective crowd counting. The U part is produced by replacing the nearest upsampling in the encoder of U-Net with max-unpooling. This modification provides a better crowd counting performance by capturing more spatial information. The max-unpooling layers upsample the feature maps based on the max locations held from the downsampling process. The ASD part is constructed with three light pathways, two of which have been learned to reflect various densities of the crowd and define the appropriate geometric configuration employing various sizes of the receptive field. The third pathway is an adaptation path, which implicitly discovers and models complex scenarios to recalibrate pathway-wise responses adaptively. ASD has no additional branches to avoid increasing the complexity. The designed model is end-to-end trainable. This integration provides an effective model to count crowds in both dense and sparse datasets. It also predicts an elevated quality density map with a high structural similarity index and a high peak signal-to-noise ratio. Several comprehensive experiments on four popular datasets for crowd counting have been carried out to demonstrate the proposed method's promising performance compared to other state-of-the-art approaches. Furthermore, a new dataset with its manual annotations, called Haramain with three different scenes and different densities, is introduced and used for evaluating the U-ASD Net

    Biometric face recognition using multilinear projection and artificial intelligence

    Get PDF
    PhD ThesisNumerous problems of automatic facial recognition in the linear and multilinear subspace learning have been addressed; nevertheless, many difficulties remain. This work focuses on two key problems for automatic facial recognition and feature extraction: object representation and high dimensionality. To address these problems, a bidirectional two-dimensional neighborhood preserving projection (B2DNPP) approach for human facial recognition has been developed. Compared with 2DNPP, the proposed method operates on 2-D facial images and performs reductions on the directions of both rows and columns of images. Furthermore, it has the ability to reveal variations between these directions. To further improve the performance of the B2DNPP method, a new B2DNPP based on the curvelet decomposition of human facial images is introduced. The curvelet multi- resolution tool enhances the edges representation and other singularities along curves, and thus improves directional features. In this method, an extreme learning machine (ELM) classifier is used which significantly improves classification rate. The proposed C-B2DNPP method decreases error rate from 5.9% to 3.5%, from 3.7% to 2.0% and from 19.7% to 14.2% using ORL, AR, and FERET databases compared with 2DNPP. Therefore, it achieves decreases in error rate more than 40%, 45%, and 27% respectively with the ORL, AR, and FERET databases. Facial images have particular natural structures in the form of two-, three-, or even higher-order tensors. Therefore, a novel method of supervised and unsupervised multilinear neighborhood preserving projection (MNPP) is proposed for face recognition. This allows the natural representation of multidimensional images 2-D, 3-D or higher-order tensors and extracts useful information directly from tensotial data rather than from matrices or vectors. As opposed to a B2DNPP which derives only two subspaces, in the MNPP method multiple interrelated subspaces are obtained over different tensor directions, so that the subspaces are learned iteratively by unfolding the tensor along the different directions. The performance of the MNPP has performed in terms of the two modes of facial recognition biometrics systems of identification and verification. The proposed supervised MNPP method achieved decrease over 50.8%, 75.6%, and 44.6% in error rate using ORL, AR, and FERET databases respectively, compared with 2DNPP. Therefore, the results demonstrate that the MNPP approach obtains the best overall performance in various learning scenarios

    Subjective and objective quality assessment of ancient degraded documents

    Get PDF
    Archiving, restoration and analysis of damaged manuscripts have been largely increased in recent decades. Usually, these documents are physically degraded because of aging and improper handing. They also cannot be processed manually because a massive volume of these documents exist in libraries and archives around the world. Therefore, automatic methodologies are needed to preserve and to process their content. These documents are usually processed through their images. Degraded document image processing is a difficult task mainly because of the existing physical degradations. While it can be very difficult to accurately locate and remove such distortions, analyzing the severity and type(s) of these distortions is feasible. This analysis provides useful information on the type and severity of degradations with a number of applications. The main contributions of this thesis are to propose models for objectively assessing the physical condition of document images and to classify their degradations. In this thesis, three datasets of degraded document images along with the subjective ratings for each image are developed. In addition, three no-reference document image quality assessment (NR-DIQA) metrics are proposed for historical and medieval document images. It should be mentioned that degraded medieval document images are a subset of the historical document images and may contain both graphical and textual content. Finally, we propose a degradation classification model in order to identify common distortion types in old document images. Essentially, existing no reference image quality assessment (NR-IQA) metrics are not designed to assess physical document distortions. In the first contribution, we propose the first dataset of degraded document images along with the human opinion scores for each document image. This dataset is introduced to evaluate the quality of historical document images. We also propose an objective NR-DIQA metric based on the statistics of the mean subtracted contrast normalized (MSCN) coefficients computed from segmented layers of each document image. The segmentation into four layers of foreground and background is done based on an analysis of the log-Gabor filters. This segmentation is based on the assumption that the sensitivity of the human visual system (HVS) is different at the locations of text and non-text. Experimental results show that the proposed metric has comparable or better performance than the state-of-the-art metrics, while it has a moderate complexity. Degradation identification and quality assessment can complement each other to provide information on both type and severity of degradations in document images. Therefore, we introduced, in the second contribution, a multi-distortion historical document image database that can be used for the research on quality assessment of degraded documents as well as degradation classification. The developed dataset contains historical document images which are classified into four categories based on their distortion types, namely, paper translucency, stain, readers’ annotations, and worn holes. An efficient NR-DIQA metric is then proposed based on three sets of spatial and frequency image features extracted from two layers of text and non-text. In addition, these features are used to estimate the probability of the four aforementioned physical distortions for the first time in the literature. Both proposed quality assessment and degradation classification models deliver a very promising performance. Finally, we develop in the third contribution a dataset and a quality assessment metric for degraded medieval document (DMD) images. This type of degraded images contains both textual and pictorial information. The introduced DMD dataset is the first dataset in its category that also provides human ratings. Also, we propose a new no-reference metric in order to evaluate the quality of DMD images in the developed dataset. The proposed metric is based on the extraction of several statistical features from three layers of text, non-text, and graphics. The segmentation is based on color saliency with assumption that pictorial parts are colorful. It also follows HVS that gives different weights to each layer. The experimental results validate the effectiveness of the proposed NR-DIQA strategy for DMD images

    Thermal Cameras and Applications:A Survey

    Get PDF

    Evaluation and Prediction of Evoked Emotions Induced by Image Manipulations

    Get PDF
    Various image editing tools make our pictures more attractive, and at the same time, evoke different emotional responses. With powerful and easy-to-use imaging applications, capturing, editing and then sharing pictures have become daily life for many. This paper investigates the influence of several image manipulations on evoked emotions for different types of images. To do so, various types of images clustered in different categories, were collected from Instagram and subjective evaluations were conducted via crowdsourcing to gather the emotional responses on different manipulations as perceived by subjects. Evaluation results show that certain image manipulations can induce different evoked emotions on transformed pictures when compared to the original ones. However, such changes in image emotions due to manipulation are highly content dependent. Then, we conducted a machine learning based experiment, in attempt to predict the emotions of a manipulated image given its original version and the desired manipulation method. Experimental results present a promising performance of such a prediction model, which could pave the road to automatic selection or recommendation of image editing tools that can efficiently transform or emphasize desired emotions in pictures
    • …
    corecore