120 research outputs found

    Modeling small objects under uncertainties : novel algorithms and applications.

    Get PDF
    Active Shape Models (ASM), Active Appearance Models (AAM) and Active Tensor Models (ATM) are common approaches to model elastic (deformable) objects. These models require an ensemble of shapes and textures, annotated by human experts, in order identify the model order and parameters. A candidate object may be represented by a weighted sum of basis generated by an optimization process. These methods have been very effective for modeling deformable objects in biomedical imaging, biometrics, computer vision and graphics. They have been tried mainly on objects with known features that are amenable to manual (expert) annotation. They have not been examined on objects with severe ambiguities to be uniquely characterized by experts. This dissertation presents a unified approach for modeling, detecting, segmenting and categorizing small objects under uncertainty, with focus on lung nodules that may appear in low dose CT (LDCT) scans of the human chest. The AAM, ASM and the ATM approaches are used for the first time on this application. A new formulation to object detection by template matching, as an energy optimization, is introduced. Nine similarity measures of matching have been quantitatively evaluated for detecting nodules less than 1 em in diameter. Statistical methods that combine intensity, shape and spatial interaction are examined for segmentation of small size objects. Extensions of the intensity model using the linear combination of Gaussians (LCG) approach are introduced, in order to estimate the number of modes in the LCG equation. The classical maximum a posteriori (MAP) segmentation approach has been adapted to handle segmentation of small size lung nodules that are randomly located in the lung tissue. A novel empirical approach has been devised to simultaneously detect and segment the lung nodules in LDCT scans. The level sets methods approach was also applied for lung nodule segmentation. A new formulation for the energy function controlling the level set propagation has been introduced taking into account the specific properties of the nodules. Finally, a novel approach for classification of the segmented nodules into categories has been introduced. Geometric object descriptors such as the SIFT, AS 1FT, SURF and LBP have been used for feature extraction and matching of small size lung nodules; the LBP has been found to be the most robust. Categorization implies classification of detected and segmented objects into classes or types. The object descriptors have been deployed in the detection step for false positive reduction, and in the categorization stage to assign a class and type for the nodules. The AAMI ASMI A TM models have been used for the categorization stage. The front-end processes of lung nodule modeling, detection, segmentation and classification/categorization are model-based and data-driven. This dissertation is the first attempt in the literature at creating an entirely model-based approach for lung nodule analysis

    A framework of face recognition with set of testing images

    Get PDF
    We propose a novel framework to solve the face recognition problem base on set of testing images. Our framework can handle the case that no pose overlap between training set and query set. The main techniques used in this framework are manifold alignment, face normalization and discriminant learning. Experiments on different databases show our system outperforms some state of the art methods

    Building change detection from remotely sensed data using machine learning techniques

    Full text link
    As remote sensing data plays an increasingly important role in many fields, many countries have established geographic information systems. However, such systems usually suffer from obsolete scene details, making the development of change detection technology critical. Building changes are important in practice, as they are valuable in urban planning and disaster rescue. This thesis focuses on building change detection from remotely sensed data using machine learning techniques. Supervised classification is a traditional method for pixel level change detection, and relies on a suitable training dataset. Since different training datasets may affect the learning performance differently, the effects of dataset characteristics on pixel level building change detection are first studied. The research is conducted from two angles, namely the imbalance and noise in the training dataset, and multiple correlations among different features. The robustness of some supervised learning algorithms to unbalanced and noisy training datasets is examined, and the results are interpreted from a theoretical perspective. A solution for handling multiple correlations is introduced, and its performance on and applicability to building change detection is investigated. Finally, an object-based post processing technique is proposed using prior knowledge to further suppress false alarms. A novel corner based Markov random field (MRF) method is then proposed for exploring spatial information and contextual relations in changed building outline detection. Corners are treated as vertices in the graph, and a new method is proposed for determining neighbourhood relations. Energy terms in the proposed method are constructed using spatial features to describe building characteristics. An optimal solution indicates spatial features belonging to changed buildings, and changed areas are revealed based on novel linking processes. Considering the individual advantages of pixel level, contextual and spatial features, an MRF based combinational method is proposed that exploits spectral, spatial and contextual features in building change detection. It consists of pixel level detection and corner based refinement. Pixel level detection is first conducted, which provides an initial indication of changed areas. Corner based refinement is then implemented to further refine the detection results. Experimental results and quantitative analysis demonstrate the capacity and effectiveness of the proposed methods

    Multimodaalsel emotsioonide tuvastamisel põhineva inimese-roboti suhtluse arendamine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni äratundmiseks uuritakse nendes süsteemides nii inimese näoilmeid kui kakõnet. Käesolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada välja automaatne multimodaalne emotsioonituvastussüsteem. Kõnest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised näitajad. Näoilmeteanalüüsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenäo tähtsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks võetakse emotsionaalse sisuga video kokku vähendatud hulgaks põhikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnärvivõrgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija väljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse õppimiseks süsteemi viimasesetapis. Loodud süsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kõneandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednäitavad, et võrreldes olemasolevatega võimaldab käesoleva töö raames loodudsüsteem suuremat täpsust emotsioonide äratundmisel. Lisaks anname käesolevastöös ülevaate kirjanduses väljapakutud süsteemidest, millel on võimekus tunda äraemotsiooniga seotud ̆zeste. Selle ülevaate eesmärgiks on hõlbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud süsteemile ̆zestipõhiseemotsioonituvastuse võimekuse, et veelgi enam tõsta süsteemi emotsioonide äratundmise täpsust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework

    Machine learning methods for sign language recognition: a critical review and analysis.

    Get PDF
    Sign language is an essential tool to bridge the communication gap between normal and hearing-impaired people. However, the diversity of over 7000 present-day sign languages with variability in motion position, hand shape, and position of body parts making automatic sign language recognition (ASLR) a complex system. In order to overcome such complexity, researchers are investigating better ways of developing ASLR systems to seek intelligent solutions and have demonstrated remarkable success. This paper aims to analyse the research published on intelligent systems in sign language recognition over the past two decades. A total of 649 publications related to decision support and intelligent systems on sign language recognition (SLR) are extracted from the Scopus database and analysed. The extracted publications are analysed using bibliometric VOSViewer software to (1) obtain the publications temporal and regional distributions, (2) create the cooperation networks between affiliations and authors and identify productive institutions in this context. Moreover, reviews of techniques for vision-based sign language recognition are presented. Various features extraction and classification techniques used in SLR to achieve good results are discussed. The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect intelligent systems for sign language recognition are still an open problem. Overall, it is expected that this study will facilitate knowledge accumulation and creation of intelligent-based SLR and provide readers, researchers, and practitioners a roadmap to guide future direction

    Face recognition using statistical adapted local binary patterns.

    Get PDF
    Biometrics is the study of methods of recognizing humans based on their behavioral and physical characteristics or traits. Face recognition is one of the biometric modalities that received a great amount of attention from many researchers during the past few decades because of its potential applications in a variety of security domains. Face recognition however is not only concerned with recognizing human faces, but also with recognizing faces of non-biological entities or avatars. Fortunately, the need for secure and affordable virtual worlds is attracting the attention of many researchers who seek to find fast, automatic and reliable ways to identify virtual worlds’ avatars. In this work, I propose new techniques for recognizing avatar faces, which also can be applied to recognize human faces. Proposed methods are based mainly on a well-known and efficient local texture descriptor, Local Binary Pattern (LBP). I am applying different versions of LBP such as: Hierarchical Multi-scale Local Binary Patterns and Adaptive Local Binary Pattern with Directional Statistical Features in the wavelet space and discuss the effect of this application on the performance of each LBP version. In addition, I use a new version of LBP called Local Difference Pattern (LDP) with other well-known descriptors and classifiers to differentiate between human and avatar face images. The original LBP achieves high recognition rate if the tested images are pure but its performance gets worse if these images are corrupted by noise. To deal with this problem I propose a new definition to the original LBP in which the LBP descriptor will not threshold all the neighborhood pixel based on the central pixel value. A weight for each pixel in the neighborhood will be computed, a new value for each pixel will be calculated and then using simple statistical operations will be used to compute the new threshold, which will change automatically, based on the pixel’s values. This threshold can be applied with the original LBP or any other version of LBP and can be extended to work with Local Ternary Pattern (LTP) or any version of LTP to produce different versions of LTP for recognizing noisy avatar and human faces images
    corecore