33 research outputs found

    Deep-learning feature descriptor for tree bark re-identification

    Get PDF
    L’habilitĂ© de visuellement rĂ©-identifier des objets est une capacitĂ© fondamentale des systĂšmes de vision. Souvent, ces systĂšmes s’appuient sur une collection de signatures visuelles basĂ©es sur des descripteurs comme SIFT ou SURF. Cependant, ces descripteurs traditionnels ont Ă©tĂ© conçus pour un certain domaine d’aspects et de gĂ©omĂ©tries de surface (relief limitĂ©). Par consĂ©quent, les surfaces trĂšs texturĂ©es telles que l’écorce des arbres leur posent un dĂ©fi. Alors, cela rend plus difficile l’utilisation des arbres comme points de repĂšre identifiables Ă  des fins de navigation (robotique) ou le suivi du bois abattu le long d’une chaĂźne logistique (logistique). Nous proposons donc d’utiliser des descripteurs basĂ©s sur les donnĂ©es, qui une fois entraĂźnĂ© avec des images d’écorce, permettront la rĂ©-identification de surfaces d’arbres. À cet effet, nous avons collectĂ© un grand ensemble de donnĂ©es contenant 2 400 images d’écorce prĂ©sentant de forts changements d’éclairage, annotĂ©es par surface et avec la possibilitĂ© d’ĂȘtre alignĂ©es au pixels prĂšs. Nous avons utilisĂ© cet ensemble de donnĂ©es pour Ă©chantillonner parmis plus de 2 millions de parcelle d’image de 64x64 pixels afin d’entraĂźner nos nouveaux descripteurs locaux DeepBark et SqueezeBark. Notre mĂ©thode DeepBark a montrĂ© un net avantage par rapport aux descripteurs fabriquĂ©s Ă  la main SIFT et SURF. Par exemple, nous avons dĂ©montrĂ© que DeepBark peut atteindre une mAP de 87.2% lorsqu’il doit retrouver 11 images d’écorce pertinentes, i.e correspondant Ă  la mĂȘme surface physique, Ă  une image requĂȘte parmis 7,900 images. Notre travail suggĂšre donc qu’il est possible de rĂ©-identifier la surfaces des arbres dans un contexte difficile, tout en rendant public un nouvel ensemble de donnĂ©es.The ability to visually re-identify objects is a fundamental capability in vision systems. Oftentimes,it relies on collections of visual signatures based on descriptors, such as SIFT orSURF. However, these traditional descriptors were designed for a certain domain of surface appearances and geometries (limited relief). Consequently, highly-textured surfaces such as tree bark pose a challenge to them. In turn, this makes it more difficult to use trees as identifiable landmarks for navigational purposes (robotics) or to track felled lumber along a supply chain (logistics). We thus propose to use data-driven descriptors trained on bark images for tree surface re-identification. To this effect, we collected a large dataset containing 2,400 bark images with strong illumination changes, annotated by surface and with the ability to pixel align them. We used this dataset to sample from more than 2 million 64 64 pixel patches to train our novel local descriptors DeepBark and SqueezeBark. Our DeepBark method has shown a clear advantage against the hand-crafted descriptors SIFT and SURF. For instance, we demonstrated that DeepBark can reach a mAP of 87.2% when retrieving 11 relevant barkimages, i.e. corresponding to the same physical surface, to a bark query against 7,900 images. ur work thus suggests that re-identifying tree surfaces in a challenging illuminations contextis possible. We also make public our dataset, which can be used to benchmark surfacere-identification techniques

    An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges

    Get PDF
    Abstract—In the past fifteen years, the grow of using Bag of Words (BoW) method in the field of computer vision is visibly observed. In addition, for the text classification and texture recognition, it can also be used in classification of images, videos, robot localization, etc. It is one of the most common methods for the categorization of text and objects. In text classification, the BoW method records the number of occurrences of each bag that is created for each instance type or word disregarding the order of the words or the grammar. And in visual scene classification it is based on clusters of local descriptors which are taken from the images disregarding the order of the clusters. The key idea is to generate a histogram for the words in the documents or the features in the images to represent the specified document or image. The BoW method is computationally and even conceptually is simpler than many other classification methods. For that reason, BoW based systems could record new and higher performance scores on common used benchmarks of text and image classification algorithms. This paper presents an overview of BoW, importance of BoW, how does it work, applications and challenges of using BoW. This study is useful in terms of introducing the BoW method to the new researchers and providing a good background with associated related works to the researchers that are working on the model

    Automatska klasifikacija slika zasnovana na fuziji deskriptora i nadgledanom maơinskom učenju

    Get PDF
    This thesis investigates possibilities for fusion, i.e. combining of different types of image descriptors, in order to improve accuracy and efficiency of image classification. Broad range of techniques for fusion of color and texture descriptors were analyzed, belonging to two approaches – early fusion and late fusion. Early fusion approach combines descriptors during the extraction phase, while late fusion is based on combining of classification results of independent classifiers. An efficient algorithm for extraction of a compact image descriptor based on early fusion of texture and color information, is proposed in the thesis. Experimental evaluation of the algorithm demonstrated a good compromise between efficiency and accuracy of classification results. Research on the late fusion approach was focused on artificial neural networks and a recently introduced algorithm for extremly fast training of neural networks denoted as Extreme Learning Machines - ELM. Main disadvantages of ELM are insufficient stability and limited accuracy of results. To overcome these problems, a technique for combining results of multiple ELM-s into a single classifier is proposed, based on probability sum rules. The created ensemble of ELM-s has demonstrated significiant improvement of accuracy and stability of results, compared with an individual ELM. In order to additionaly improve classification accuracy, a novel hierarchical method for late fusion of multiple complementary descriptors by using ELM classifiers, is proposed in the thesis. In the first phase of the proposed method, a separate ensemble of ELM classifiers is trained for every single descriptor. In the second phase, an additional ELM-based classifier is introduced to learn the optimal combination of descriptors for every category. This approach enables a system to choose those descriptors which are the most representative for every category. Comparative evaluation over several benchmark datasets, has demonstrated highly accurate classification results, comparable to the state-of-the-art methods

    Robust Extreme Learning Machine for Modeling with Unknown Noise

    Get PDF
    Extreme learning machine (ELM) is an emerging machine learning technique for training single hidden layer feedforward networks (SLFNs). During the training phase, ELM model can be created by simultaneously minimizing the modeling errors and norm of the output weights. Usually, squared loss is widely utilized in the objective function of ELMs, which is theoretically optimal for the Gaussian error distribution. However, in practice, data collected from uncertain and heterogeneous environments trivially result in unknown noise, which may be very complex and cannot be described well using any single distribution. In order to tackle this issue, in this paper, a robust ELM (R-ELM) is proposed for improving the modeling capability and robustness with Gaussian and non-Gaussian noise. In R-ELM, a modified objective function is constructed to fit the noise using mixture of Gaussian (MoG) to approximate any continuous distribution. In addition, the corresponding solution for the new objective function is developed based on expectation maximization (EM) algorithm. Comprehensive experiments, both on selected benchmark datasets and real world applications, demonstrate that the proposed R-ELM has better robustness and generalization performance than state-of-the-art machine learning approaches

    Device-free localization via an extreme learning machine with parameterized geometrical feature extraction

    Full text link
    © 2017 by the authors. Licensee MDPI, Basel, Switzerland. Device-free localization (DFL) is becoming one of the new technologies in wireless localization field, due to its advantage that the target to be localized does not need to be attached to any electronic device. In the radio-frequency (RF) DFL system, radio transmitters (RTs) and radio receivers (RXs) are used to sense the target collaboratively, and the location of the target can be estimated by fusing the changes of the received signal strength (RSS) measurements associated with the wireless links. In this paper, we will propose an extreme learning machine (ELM) approach for DFL, to improve the efficiency and the accuracy of the localization algorithm. Different from the conventional machine learning approaches for wireless localization, in which the above differential RSS measurements are trivially used as the only input features, we introduce the parameterized geometrical representation for an affected link, which consists of its geometrical intercepts and differential RSS measurement. Parameterized geometrical feature extraction (PGFE) is performed for the affected links and the features are used as the inputs of ELM. The proposed PGFE-ELM for DFL is trained in the offline phase and performed for real-time localization in the online phase, where the estimated location of the target is obtained through the created ELM. PGFE-ELM has the advantages that the affected links used by ELM in the online phase can be different from those used for training in the offline phase, and can be more robust to deal with the uncertain combination of the detectable wireless links. Experimental results show that the proposed PGFE-ELM can improve the localization accuracy and learning speed significantly compared with a number of the existing machine learning and DFL approaches, including the weighted K-nearest neighbor (WKNN), support vector machine (SVM), back propagation neural network (BPNN), as well as the well-known radio tomographic imaging (RTI) DFL approach

    Banknote Authentication and Medical Image Diagnosis Using Feature Descriptors and Deep Learning Methods

    Get PDF
    Banknote recognition and medical image analysis have been the foci of image processing and pattern recognition research. As counterfeiters have taken advantage of the innovation in print media technologies for reproducing fake monies, hence the need to design systems which can reassure and protect citizens of the authenticity of banknotes in circulation. Similarly, many physicians must interpret medical images. But image analysis by humans is susceptible to error due to wide variations across interpreters, lethargy, and human subjectivity. Computer-aided diagnosis is vital to improvements in medical analysis, as they facilitate the identification of findings that need treatment and assist the expert’s workflow. Thus, this thesis is organized around three such problems related to Banknote Authentication and Medical Image Diagnosis. In our first research problem, we proposed a new banknote recognition approach that classifies the principal components of extracted HOG features. We further experimented on computing HOG descriptors from cells created from image patch vertices of SURF points and designed a feature reduction approach based on a high correlation and low variance filter. In our second research problem, we developed a mobile app for banknote identification and counterfeit detection using the Unity 3D software and evaluated its performance based on a Cascaded Ensemble approach. The algorithm was then extended to a client-server architecture using SIFT and SURF features reduced by Bag of Words and high correlation-based HOG vectors. In our third research problem, experiments were conducted on a pre-trained mobile app for medical image diagnosis using three convolutional layers with an Ensemble Classifier comprising PCA and bagging of five base learners. Also, we implemented a Bidirectional Generative Adversarial Network to mitigate the effect of the Binary Cross Entropy loss based on a Deep Convolutional Generative Adversarial Network as the generator and encoder with Capsule Network as the discriminator while experimenting on images with random composition and translation inferences. Lastly, we proposed a variant of the Single Image Super-resolution for medical analysis by redesigning the Super Resolution Generative Adversarial Network to increase the Peak Signal to Noise Ratio during image reconstruction by incorporating a loss function based on the mean square error of pixel space and Super Resolution Convolutional Neural Network layers

    Classifier Ensemble Feature Selection for Automatic Fault Diagnosis

    Get PDF
    "An efficient ensemble feature selection scheme applied for fault diagnosis is proposed, based on three hypothesis: a. A fault diagnosis system does not need to be restricted to a single feature extraction model, on the contrary, it should use as many feature models as possible, since the extracted features are potentially discriminative and the feature pooling is subsequently reduced with feature selection; b. The feature selection process can be accelerated, without loss of classification performance, combining feature selection methods, in a way that faster and weaker methods reduce the number of potentially non-discriminative features, sending to slower and stronger methods a filtered smaller feature set; c. The optimal feature set for a multi-class problem might be different for each pair of classes. Therefore, the feature selection should be done using an one versus one scheme, even when multi-class classifiers are used. However, since the number of classifiers grows exponentially to the number of the classes, expensive techniques like Error-Correcting Output Codes (ECOC) might have a prohibitive computational cost for large datasets. Thus, a fast one versus one approach must be used to alleviate such a computational demand. These three hypothesis are corroborated by experiments. The main hypothesis of this work is that using these three approaches together is possible to improve significantly the classification performance of a classifier to identify conditions in industrial processes. Experiments have shown such an improvement for the 1-NN classifier in industrial processes used as case study.

    Automated screening methods for mental and neuro-developmental disorders

    Get PDF
    Mental and neuro-developmental disorders such as depression, bipolar disorder, and autism spectrum disorder (ASD) are critical healthcare issues which affect a large number of people. Depression, according to the World Health Organisation, is the largest cause of disability worldwide and affects more than 300 million people. Bipolar disorder affects more than 60 million individuals worldwide. ASD, meanwhile, affects more than 1 in 100 people in the UK. Not only do these disorders adversely affect the quality of life of affected individuals, they also have a significant economic impact. While brute-force approaches are potentially useful for learning new features which could be representative of these disorders, such approaches may not be best suited for developing robust screening methods. This is due to a myriad of confounding factors, such as the age, gender, cultural background, and socio-economic status, which can affect social signals of individuals in a similar way as the symptoms of these disorders. Brute-force approaches may learn to exploit effects of these confounding factors on social signals in place of effects due to mental and neuro-developmental disorders. The main objective of this thesis is to develop, investigate, and propose computational methods to screen for mental and neuro-developmental disorders in accordance with descriptions given in the Diagnostic and Statistical Manual (DSM). The DSM manual is a guidebook published by the American Psychiatric Association which offers common language on mental disorders. Our motivation is to alleviate, to an extent, the possibility of machine learning algorithms picking up one of the confounding factors to optimise performance for the dataset – something which we do not find uncommon in research literature. To this end, we introduce three new methods for automated screening for depression from audio/visual recordings, namely: turbulence features, craniofacial movement features, and Fisher Vector based representation of speech spectra. We surmise that psychomotor changes due to depression lead to uniqueness in an individual's speech pattern which manifest as sudden and erratic changes in speech feature contours. The efficacy of these features is demonstrated as part of our solution to Audio/Visual Emotion Challenge 2017 (AVEC 2017) on Depression severity prediction. We also detail a methodology to quantify specific craniofacial movements, which we hypothesised could be indicative of psychomotor retardation, and hence depression. The efficacy of craniofacial movement features is demonstrated using datasets from the 2014 and 2017 editions of AVEC Depression severity prediction challenges. Finally, using the dataset provided as part of AVEC 2016 Depression classification challenge, we demonstrate that differences between speech of individuals with and without depression can be quantified effectively using the Fisher Vector representation of speech spectra. For our work on automated screening of bipolar disorder, we propose methods to classify individuals with bipolar disorder into states of remission, hypo-mania, and mania. Here, we surmise that like depression, individuals with different levels of mania have certain uniqueness to their social signals. Based on this understanding, we propose the use of turbulence features for audio/visual social signals (i.e. speech and facial expressions). We also propose the use of Fisher Vectors to create a unified representation of speech in terms of prosody, voice quality, and speech spectra. These methods have been proposed as part of our solution to the AVEC 2018 Bipolar disorder challenge. In addition, we find that the task of automated screening for ASD is much more complicated. Here, confounding factors can easily overwhelm socials signals which are affected by ASD. We discuss, in the light of research literature and our experimental analysis, that significant collaborative work is required between computer scientists and clinicians to discern social signals which are robust to common confounding factors
    corecore