434 research outputs found

    Weighted Nuclear Norm Minimization Based Tongue Specular Reflection Removal

    Get PDF
    In computational tongue diagnosis, specular reflection is generally inevitable in tongue image acquisition, which has adverse impact on the feature extraction and tends to degrade the diagnosis performance. In this paper, we proposed a two-stage (i.e., the detection and inpainting pipeline) approach to address this issue: (i) by considering both highlight reflection and subreflection areas, a superpixel-based segmentation method was adopted for the detection of the specular reflection areas; (ii) by extending the weighted nuclear norm minimization (WNNM) model, a nonlocal inpainting method is proposed for specular reflection removal. Experimental results on synthetic and real images show that the proposed method is accurate in detecting the specular reflection areas and is effective in restoring tongue image with more natural texture information of tongue body

    Registration and statistical analysis of the tongue shape during speech production

    Get PDF
    This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

    The impact of pre- and post-image processing techniques on deep learning frameworks: A comprehensive review for digital pathology image analysis

    Get PDF
    Recently, deep learning frameworks have rapidly become the main methodology for analyzing medical images. Due to their powerful learning ability and advantages in dealing with complex patterns, deep learning algorithms are ideal for image analysis challenges, particularly in the field of digital pathology. The variety of image analysis tasks in the context of deep learning includes classification (e.g., healthy vs. cancerous tissue), detection (e.g., lymphocytes and mitosis counting), and segmentation (e.g., nuclei and glands segmentation). The majority of recent machine learning methods in digital pathology have a pre- and/or post-processing stage which is integrated with a deep neural network. These stages, based on traditional image processing methods, are employed to make the subsequent classification, detection, or segmentation problem easier to solve. Several studies have shown how the integration of pre- and post-processing methods within a deep learning pipeline can further increase the model's performance when compared to the network by itself. The aim of this review is to provide an overview on the types of methods that are used within deep learning frameworks either to optimally prepare the input (pre-processing) or to improve the results of the network output (post-processing), focusing on digital pathology image analysis. Many of the techniques presented here, especially the post-processing methods, are not limited to digital pathology but can be extended to almost any image analysis field

    The impact of pre- and post-image processing techniques on deep learning frameworks: A comprehensive review for digital pathology image analysis.

    Get PDF
    Recently, deep learning frameworks have rapidly become the main methodology for analyzing medical images. Due to their powerful learning ability and advantages in dealing with complex patterns, deep learning algorithms are ideal for image analysis challenges, particularly in the field of digital pathology. The variety of image analysis tasks in the context of deep learning includes classification (e.g., healthy vs. cancerous tissue), detection (e.g., lymphocytes and mitosis counting), and segmentation (e.g., nuclei and glands segmentation). The majority of recent machine learning methods in digital pathology have a pre- and/or post-processing stage which is integrated with a deep neural network. These stages, based on traditional image processing methods, are employed to make the subsequent classification, detection, or segmentation problem easier to solve. Several studies have shown how the integration of pre- and post-processing methods within a deep learning pipeline can further increase the model's performance when compared to the network by itself. The aim of this review is to provide an overview on the types of methods that are used within deep learning frameworks either to optimally prepare the input (pre-processing) or to improve the results of the network output (post-processing), focusing on digital pathology image analysis. Many of the techniques presented here, especially the post-processing methods, are not limited to digital pathology but can be extended to almost any image analysis field

    Local wavelet features for statistical object classification and localisation

    Get PDF
    This article presents a system for texture-based probabilistic classification and localisation of 3D objects in 2D digital images and discusses selected applications. The objects are described by local feature vectors computed using the wavelet transform. In the training phase, object features are statistically modelled as normal density functions. In the recognition phase, a maximisation algorithm compares the learned density functions with the feature vectors extracted from a real scene and yields the classes and poses of objects found in it. Experiments carried out on a real dataset of over 40000 images demonstrate the robustness of the system in terms of classification and localisation accuracy. Finally, two important application scenarios are discussed, namely classification of museum artefacts and classification of metallography images

    The application of manifold based visual speech units for visual speech recognition

    Get PDF
    This dissertation presents a new learning-based representation that is referred to as a Visual Speech Unit for visual speech recognition (VSR). The automated recognition of human speech using only features from the visual domain has become a significant research topic that plays an essential role in the development of many multimedia systems such as audio visual speech recognition(AVSR), mobile phone applications, human-computer interaction (HCI) and sign language recognition. The inclusion of the lip visual information is opportune since it can improve the overall accuracy of audio or hand recognition algorithms especially when such systems are operated in environments characterized by a high level of acoustic noise. The main contribution of the work presented in this thesis is located in the development of a new learning-based representation that is referred to as Visual Speech Unit for Visual Speech Recognition (VSR). The main components of the developed Visual Speech Recognition system are applied to: (a) segment the mouth region of interest, (b) extract the visual features from the real time input video image and (c) to identify the visual speech units. The major difficulty associated with the VSR systems resides in the identification of the smallest elements contained in the image sequences that represent the lip movements in the visual domain. The Visual Speech Unit concept as proposed represents an extension of the standard viseme model that is currently applied for VSR. The VSU model augments the standard viseme approach by including in this new representation not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG- 4) viseme models. Two experimental results indicate that: 1. The developed VSR system achieved 80-90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 62-72%. 2. 15 words are identified when VSU and viseme are employed as the visual speech element. The accuracy rate for word recognition based on VSUs is 7%-12% higher than the accuracy rate based on visemes

    Facial soft tissue segmentation

    Get PDF
    The importance of the face for socio-ecological interaction is the cause for a high demand on any surgical intervention on the facial musculo-skeletal system. Bones and soft-tissues are of major importance for any facial surgical treatment to guarantee an optimal, functional and aesthetical result. For this reason, surgeons want to pre-operatively plan, simulate and predict the outcome of the surgery allowing for shorter operation times and improved quality. Accurate simulation requires exact segmentation knowledge of the facial tissues. Thus semi-automatic segmentation techniques are required. This thesis proposes semi-automatic methods for segmentation of the facial soft-tissues, such as muscles, skin and fat, from CT and MRI datasets, using a Markov Random Fields (MRF) framework. Due to image noise, artifacts, weak edges and multiple objects of similar appearance in close proximity, it is difficult to segment the object of interest by using image information alone. Segmentations would leak at weak edges into neighboring structures that have a similar intensity profile. To overcome this problem, additional shape knowledge is incorporated in the energy function which can then be minimized using Graph-Cuts (GC). Incremental approaches by incorporating additional prior shape knowledge are presented. The proposed approaches are not object specific and can be applied to segment any class of objects be that anatomical or non-anatomical from medical or non-medical image datasets, whenever a statistical model is present. In the first approach a 3D mean shape template is used as shape prior, which is integrated into the MRF based energy function. Here, the shape knowledge is encoded into the data and the smoothness terms of the energy function that constrains the segmented parts to a reasonable shape. In the second approach, to improve handling of shape variations naturally found in the population, the fixed shape template is replaced by a more robust 3D statistical shape model based on Probabilistic Principal Component Analysis (PPCA). The advantages of using the Probabilistic PCA are that it allows reconstructing the optimal shape and computing the remaining variance of the statistical model from partial information. By using an iterative method, the statistical shape model is then refined using image based cues to get a better fitting of the statistical model to the patient's muscle anatomy. These image cues are based on the segmented muscle, edge information and intensity likelihood of the muscle. Here, a linear shape update mechanism is used to fit the statistical model to the image based cues. In the third approach, the shape refinement step is further improved by using a non-linear shape update mechanism where vertices of the 3D mesh of the statistical model incur the non-linear penalty depending on the remaining variability of the vertex. The non-linear shape update mechanism provides a more accurate shape update and helps in a finer shape fitting of the statistical model to the image based cues in areas where the shape variability is high. Finally, a unified approach is presented to segment the relevant facial muscles and the remaining facial soft-tissues (skin and fat). One soft-tissue layer is removed at a time such as the head and non-head regions followed by the skin. In the next step, bones are removed from the dataset, followed by the separation of the brain and non-brain regions as well as the removal of air cavities. Afterwards, facial fat is segmented using the standard Graph-Cuts approach. After separating the important anatomical structures, finally, a 3D fixed shape template mesh of the facial muscles is used to segment the relevant facial muscles. The proposed methods are tested on the challenging example of segmenting the masseter muscle. The datasets were noisy with almost all possessing mild to severe imaging artifacts such as high-density artifacts caused by e.g. dental fillings and dental implants. Qualitative and quantitative experimental results show that by incorporating prior shape knowledge leaking can be effectively constrained to obtain better segmentation results

    Digital image colorimetry for determination of sulfonamides in water

    Get PDF
    This work aims to develop a digital image-based colorimetry for screening of sulfonamides (SAs) in water. It will be based on the determination of SAs in water, by analyzing the color response with an automatic image processing algorithm.Antimicrobial agents are considered emerging pollutants in water, because of their potential to accelerate spread of bacterial resistance genes, and due to their harmful effect to ecosystem through death or inhibition of natural microbiota. Sulfonamides (SAs) are an important antimicrobial group and it is widely used in both human and veterinary medicine. Studies have demonstrated that SAs are very mobile and highly available in soil with no bioaccumulation. Furthermore, these compounds seem to be quite resistant to biodegradation in surface water which can benefit contamination of aquatic environment. Thus, monitoring of SAs levels in water are very important to determine their aquatic risk assessment. Several methods for determination of SAs in water have been developed. Most of them are based on the coupling of high-performance liquid chromatography (LC) and mass spectrometry (MS). LC-MS is widely used due to their high sensitivity and specificity; however, this approach is very expensive and does not allow in situ analysis. Hence, development of field deployable screening methods is required. Methods based on digital image colorimetry have been broadly applied for point-of-care tests, forensic analysis and environmental monitoring. The digital image based methods are very promising as field screening techniques because they are fast, low cost, portable and easy handling methodologies
    corecore