717 research outputs found

    Multimodal Fusion of Polynomial Classifiers for Automatic Person Recognition

    Get PDF
    With the prevalence of the information age, privacy and personalization are forefront in today\u27s society. As such, biometrics are viewed as essential components of current and evolving technological systems. Consumers demand unobtrusive and noninvasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals (in all environments) without encumbering the user with a head-mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multimodal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality–giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within a Markov random field (MRF) framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN (in the audio domain) over a range of signal-to-noise ratios

    A Survey on Deep Learning in Medical Image Analysis

    Full text link
    Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.Comment: Revised survey includes expanded discussion section and reworked introductory section on common deep architectures. Added missed papers from before Feb 1st 201

    Hybrid Multilevel Thresholding and Improved Harmony Search Algorithm for Segmentation

    Get PDF
    This paper proposes a new method for image segmentation is hybrid multilevel thresholding and improved harmony search algorithm. Improved harmony search algorithm which is a method for finding vector solutions by increasing its accuracy. The proposed method looks for a random candidate solution, then its quality is evaluated through the Otsu objective function. Furthermore, the operator continues to evolve the solution candidate circuit until the optimal solution is found. The dataset used in this study is the retina dataset, tongue, lenna, baboon, and cameraman. The experimental results show that this method produces the high performance as seen from peak signal-to-noise ratio analysis (PNSR). The PNSR result for retinal image averaged 40.342 dB while for the average tongue image 35.340 dB. For lenna, baboon and cameramen produce an average of 33.781 dB, 33.499 dB, and 34.869 dB. Furthermore, the process of object recognition and identification is expected to use this method to produce a high degree of accuracy

    Facial soft tissue segmentation

    Get PDF
    The importance of the face for socio-ecological interaction is the cause for a high demand on any surgical intervention on the facial musculo-skeletal system. Bones and soft-tissues are of major importance for any facial surgical treatment to guarantee an optimal, functional and aesthetical result. For this reason, surgeons want to pre-operatively plan, simulate and predict the outcome of the surgery allowing for shorter operation times and improved quality. Accurate simulation requires exact segmentation knowledge of the facial tissues. Thus semi-automatic segmentation techniques are required. This thesis proposes semi-automatic methods for segmentation of the facial soft-tissues, such as muscles, skin and fat, from CT and MRI datasets, using a Markov Random Fields (MRF) framework. Due to image noise, artifacts, weak edges and multiple objects of similar appearance in close proximity, it is difficult to segment the object of interest by using image information alone. Segmentations would leak at weak edges into neighboring structures that have a similar intensity profile. To overcome this problem, additional shape knowledge is incorporated in the energy function which can then be minimized using Graph-Cuts (GC). Incremental approaches by incorporating additional prior shape knowledge are presented. The proposed approaches are not object specific and can be applied to segment any class of objects be that anatomical or non-anatomical from medical or non-medical image datasets, whenever a statistical model is present. In the first approach a 3D mean shape template is used as shape prior, which is integrated into the MRF based energy function. Here, the shape knowledge is encoded into the data and the smoothness terms of the energy function that constrains the segmented parts to a reasonable shape. In the second approach, to improve handling of shape variations naturally found in the population, the fixed shape template is replaced by a more robust 3D statistical shape model based on Probabilistic Principal Component Analysis (PPCA). The advantages of using the Probabilistic PCA are that it allows reconstructing the optimal shape and computing the remaining variance of the statistical model from partial information. By using an iterative method, the statistical shape model is then refined using image based cues to get a better fitting of the statistical model to the patient's muscle anatomy. These image cues are based on the segmented muscle, edge information and intensity likelihood of the muscle. Here, a linear shape update mechanism is used to fit the statistical model to the image based cues. In the third approach, the shape refinement step is further improved by using a non-linear shape update mechanism where vertices of the 3D mesh of the statistical model incur the non-linear penalty depending on the remaining variability of the vertex. The non-linear shape update mechanism provides a more accurate shape update and helps in a finer shape fitting of the statistical model to the image based cues in areas where the shape variability is high. Finally, a unified approach is presented to segment the relevant facial muscles and the remaining facial soft-tissues (skin and fat). One soft-tissue layer is removed at a time such as the head and non-head regions followed by the skin. In the next step, bones are removed from the dataset, followed by the separation of the brain and non-brain regions as well as the removal of air cavities. Afterwards, facial fat is segmented using the standard Graph-Cuts approach. After separating the important anatomical structures, finally, a 3D fixed shape template mesh of the facial muscles is used to segment the relevant facial muscles. The proposed methods are tested on the challenging example of segmenting the masseter muscle. The datasets were noisy with almost all possessing mild to severe imaging artifacts such as high-density artifacts caused by e.g. dental fillings and dental implants. Qualitative and quantitative experimental results show that by incorporating prior shape knowledge leaking can be effectively constrained to obtain better segmentation results

    Infant Cry Signal Processing, Analysis, and Classification with Artificial Neural Networks

    Get PDF
    As a special type of speech and environmental sound, infant cry has been a growing research area covering infant cry reason classification, pathological infant cry identification, and infant cry detection in the past two decades. In this dissertation, we build a new dataset, explore new feature extraction methods, and propose novel classification approaches, to improve the infant cry classification accuracy and identify diseases by learning infant cry signals. We propose a method through generating weighted prosodic features combined with acoustic features for a deep learning model to improve the performance of asphyxiated infant cry identification. The combined feature matrix captures the diversity of variations within infant cries and the result outperforms all other related studies on asphyxiated baby crying classification. We propose a non-invasive fast method of using infant cry signals with convolutional neural network (CNN) based age classification to diagnose the abnormality of infant vocal tract development as early as 4-month age. Experiments discover the pattern and tendency of the vocal tract changes and predict the abnormality of infant vocal tract by classifying the cry signals into younger age category. We propose an approach of generating hybrid feature set and using prior knowledge in a multi-stage CNNs model for robust infant sound classification. The dominant and auxiliary features within the set are beneficial to enlarge the coverage as well as keeping a good resolution for modeling the diversity of variations within infant sound and the experimental results give encouraging improvements on two relative databases. We propose an approach of graph convolutional network (GCN) with transfer learning for robust infant cry reason classification. Non-fully connected graphs based on the similarities among the relevant nodes are built to consider the short-term and long-term effects of infant cry signals related to inner-class and inter-class messages. With as limited as 20% of labeled training data, our model outperforms that of the CNN model with 80% labeled training data in both supervised and semi-supervised settings. Lastly, we apply mel-spectrogram decomposition to infant cry classification and propose a fusion method to further improve the infant cry classification performance

    Visual Passwords Using Automatic Lip Reading

    Get PDF
    This paper presents a visual passwords system to increase security. The system depends mainly on recognizing the speaker using the visual speech signal alone. The proposed scheme works in two stages: setting the visual password stage and the verification stage. At the setting stage the visual passwords system request the user to utter a selected password, a video recording of the user face is captured, and processed by a special words-based VSR system which extracts a sequence of feature vectors. In the verification stage, the same procedure is executed, the features will be sent to be compared with the stored visual password. The proposed scheme has been evaluated using a video database of 20 different speakers (10 females and 10 males), and 15 more males in another video database with different experiment sets. The evaluation has proved the system feasibility, with average error rate in the range of 7.63% to 20.51% at the worst tested scenario, and therefore, has potential to be a practical approach with the support of other conventional authentication methods such as the use of usernames and passwords

    RGB-D Scene Representations for Prosthetic Vision

    Get PDF
    This thesis presents a new approach to scene representation for prosthetic vision. Structurally salient information from the scene is conveyed through the prosthetic vision display. Given the low resolution and dynamic range of the display, this enables robust identification and reliable interpretation of key structural features that are missed when using standard appearance-based scene representations. Specifically, two different types of salient structure are investigated: salient edge structure, for depiction of scene shape to the user; and salient object structure, for emulation of biological attention deployment when viewing a scene. This thesis proposes and evaluates novel computer vision algorithms for extracting salient edge and salient object structure from RGB-D input. Extraction of salient edge structure from the scene is first investigated through low-level analysis of surface shape. Our approach is based on the observation that regions of irregular surface shape, such as the boundary between the wall and the floor, tend to be more informative of scene structure than uniformly shaped regions. We detect these surface irregularities through multi-scale analysis of iso-disparity contour orientations, providing a real time method that robustly identifies important scene structure. This approach is then extended by using a deep CNN to learn high level information for distinguishing salient edges from structural texture. A novel depth input encoding called the depth surface descriptor (DSD) is presented, which better captures scene geometry that corresponds to salient edges, improving the learned model. These methods provide robust detection of salient edge structure in the scene. The detection of salient object structure is first achieved by noting that salient objects often have contrasting shape from their surroundings. Contrasting shape in the depth image is captured through the proposed histogram of surface orientations (HOSO) feature. This feature is used to modulate depth and colour contrast in a saliency detection framework, improving the precision of saliency seed regions and through this the accuracy of the final detection. After this, a novel formulation of structural saliency is introduced based on the angular measure of local background enclosure (LBE). This formulation addresses fundamental limitations of depth contrast methods and is not reliant on foreground depth contrast in the scene. Saliency is instead measured through the degree to which a candidate patch exhibits foreground structure. The effectiveness of the proposed approach is evaluated through both standard datasets as well as user studies that measure the contribution of structure-based representations. Our methods are found to more effectively measure salient structure in the scene than existing methods. Our approach results in improved performance compared to standard methods during practical use of an implant display
    • …
    corecore