9 research outputs found

    An efficient radiographic Image Retrieval system using Convolutional Neural Network

    Full text link

    The outperformance of the semantic learning machine, against commonly used algorithms, for binary and multi-class medical image classification: combined with the usage of feature extraction by several convolutional neural networks

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsExtensive recent research has shown the importance of innovation in medical healthcare, with a focus on Pneumonia. It is vital and lifesaving to predict Pneumonia cases as fast as possible and preferably in advance of the symptoms. An online database source managed to gather Pneumonia-specific image data, with not just the presence of the infection, but also the nature of it, divided in bacterial- and viral infection. The first achievement is extracting valuable information from the X-Ray image datasets. Using several ImageNet pre-trained CNNs, knowledge can be gained from images and transferred to numeric arrays. This, both binary and multi-class classification data, requires a sophisticated prediction algorithm that recognizes X-Ray image patterns. Multiple, recently performed experiments show promising results about the innovative Semantic Learning Machine (SLM) that is essentially a geometric semantic hill climber for feedforward Neural Networks. This SLM is based on a derivation of the Geometric Semantic Genetic Programming (GSGP) mutation operator for real-value semantics. To prove the outperformance of the binary and multi-class SLM in general, a selection of commonly used algorithms is necessary in this research. A comprehensive hyperparameter optimization is performed for commonly used algorithms for those kinds of real-life problems, such as: Random Forest, Support Vector Machine, KNearestNeighbors and Neural Networks. The results of the SLM are promising for the Pneumonia application but could be used for all types of predictions based on images in combination with the CNN feature extractions.Uma extensa pesquisa recente mostrou a importância da inovação na assistência médica, com foco na pneumonia. É vital e salva-vidas prever os casos de pneumonia o mais rápido possível e, de preferência, antes dos sintomas. Uma fonte on-line conseguiu coletar dados de imagem específicos da pneumonia, identificando não apenas a presença da infecção, mas também seu tipo, bacteriana ou viral. A primeira conquista é extrair informações valiosas dos conjuntos de dados de imagem de raios-X. Usando várias CNNs pré-treinadas da ImageNet, é possível obter conhecimento das imagens e transferi-las para matrizes numéricas. Esses dados de classificação binários e multi-classe requerem um sofisticado algoritmo de predição que reconhece os padrões de imagem de raios-X. Vários experimentos realizados recentemente mostram resultados promissores sobre a inovadora Semantic Learning Machine (SLM), que é essencialmente um hill climber semântico geométrico para feedforward neural network. Esse SLM é baseado em uma derivação do operador de mutação da Geometric Semantic Genetic Programming (GSGP) para valor-reais semânticos. Para provar o desempenho superior do SLM binário e multi-classe em geral, é necessária uma seleção de algoritmos mais comuns na pesquisa. Uma otimização abrangente dos hiperparâmetros é realizada para algoritmos comumente utilizados para esses tipos de problemas na vida real, como Random Forest, Support Vector Machine,K-Nearest Neighbors and Neural Networks. Os resultados do SLM são promissores para o aplicativo pneumonia, mas podem ser usados para todos os tipos de previsões baseadas em imagens em combinação com as extrações de recursos da CNN

    Radon Projections as Image Descriptors for Content-Based Retrieval of Medical Images

    Get PDF
    Clinical analysis and medical diagnosis of diverse diseases adopt medical imaging techniques to empower specialists to perform their tasks by visualizing internal body organs and tissues for classifying and treating diseases at an early stage. Content-Based Image Retrieval (CBIR) systems are a set of computer vision techniques to retrieve similar images from a large database based on proper image representations. Particularly in radiology and histopathology, CBIR is a promising approach to effectively screen, understand, and retrieve images with similar level of semantic descriptions from a database of previously diagnosed cases to provide physicians with reliable assistance for diagnosis, treatment planning and research. Over the past decade, the development of CBIR systems in medical imaging has expedited due to the increase in digitized modalities, an increase in computational efficiency (e.g., availability of GPUs), and progress in algorithm development in computer vision and artificial intelligence. Hence, medical specialists may use CBIR prototypes to query similar cases from a large image database based solely on the image content (and no text). Understanding the semantics of an image requires an expressive descriptor that has the ability to capture and to represent unique and invariant features of an image. Radon transform, one of the oldest techniques widely used in medical imaging, can capture the shape of organs in form of a one-dimensional histogram by projecting parallel rays through a two-dimensional object of concern at a specific angle. In this work, the Radon transform is re-designed to (i) extract features and (ii) generate a descriptor for content-based retrieval of medical images. Radon transform is applied to feed a deep neural network instead of raw images in order to improve the generalization of the network. Specifically, the framework is composed of providing Radon projections of an image to a deep autoencoder, from which the deepest layer is isolated and fed into a multi-layer perceptron for classification. This approach enables the network to (a) train much faster as the Radon projections are computationally inexpensive compared to raw input images, and (b) perform more accurately as Radon projections can make more pronounced and salient features to the network compared to raw images. This framework is validated on a publicly available radiography data set called "Image Retrieval in Medical Applications" (IRMA), consisting of 12,677 train and 1,733 test images, for which an classification accuracy of approximately 82% is achieved, outperforming all autoencoder strategies reported on the Image Retrieval in Medical Applications (IRMA) dataset. The classification accuracy is calculated by dividing the total IRMA error, a calculation outlined by the authors of the data set, with the total number of test images. Finally, a compact handcrafted image descriptor based on Radon transform was designed in this work that is called "Forming Local Intersections of Projections" (FLIP). The FLIP descriptor has been designed, through numerous experiments, for representing histopathology images. The FLIP descriptor is based on Radon transform wherein parallel projections are applied in a local 3x3 neighborhoods with 2 pixel overlap of gray-level images (staining of histopathology images is ignored). Using four equidistant projection directions in each window, the characteristics of the neighborhood is quantified by taking an element-wise minimum between each adjacent projection in each window. Thereafter, the FLIP histogram (descriptor) for each image is constructed. A multi-resolution FLIP (mFLIP) scheme is also proposed which is observed to outperform many state-of-the-art methods, among others deep features, when applied on the histopathology data set KIMIA Path24. Experiments show a total classification accuracy of approximately 72% using SVM classification, which surpasses the current benchmark of approximately 66% on the KIMIA Path24 data set

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div

    Machine Learning-based Detection of Compensatory Balance Responses and Environmental Fall Risks Using Wearable Sensors

    Get PDF
    Falls are the leading cause of fatal and non-fatal injuries among seniors worldwide, with serious and costly consequences. Compensatory balance responses (CBRs) are reactions to recover stability following a loss of balance, potentially resulting in a fall if sufficient recovery mechanisms are not activated. While performance of CBRs are demonstrated risk factors for falls in seniors, the frequency, type, and underlying cause of these incidents occurring in everyday life have not been well investigated. This study was spawned from the lack of research on development of fall risk assessment methods that can be used for continuous and long-term mobility monitoring of the geri- atric population, during activities of daily living, and in their dwellings. Wearable sensor systems (WSS) offer a promising approach for continuous real-time detection of gait and balance behavior to assess the risk of falling during activities of daily living. To detect CBRs, we record movement signals (e.g. acceleration) and activity patterns of four muscles involving in maintaining balance using wearable inertial measurement units (IMUs) and surface electromyography (sEMG) sensors. To develop more robust detection methods, we investigate machine learning approaches (e.g., support vector machines, neural networks) and successfully detect lateral CBRs, during normal gait with accuracies of 92.4% and 98.1% using sEMG and IMU signals, respectively. Moreover, to detect environmental fall-related hazards that are associated with CBRs, and affect balance control behavior of seniors, we employ an egocentric mobile vision system mounted on participants chest. Two algorithms (e.g. Gabor Barcodes and Convolutional Neural Networks) are developed. Our vision-based method detects 17 different classes of environmental risk factors (e.g., stairs, ramps, curbs) with 88.5% accuracy. To the best of the authors knowledge, this study is the first to develop and evaluate an automated vision-based method for fall hazard detection

    Identificação e verificação de escritores usando características texturais e dissimilaridade

    Get PDF
    Resumo: A verificação e identificação de escritores são atividades relacionadas a ciências forense, na qual possuem a função de auxiliar na identificação ou constatação de fraudes de documentos manuscritos. A tarefa de verificar ou identificar escritores através de sua escrita manuscrita disposta em papel torna-se árdua devido as semelhanças existentes entre a escrita de diferentes escritores e também devido a variabilidade da escrita de uma mesma pessoa. Inserido neste contexto, este trabalho discute o uso de descritores de textura para o processo de verificação e identificação de escritores. Três diferentes descritores de textura foram avaliados para elaboração desta tese, GLCM (Gray Level Co-occurrence Matrix), LBP (Local Binary Pattern) e LPQ Local Phase Quantization. Além disso, empregamos um esquema de classificação baseado na representação da dissimilaridade, o qual tem contribuído para o sucesso em problemas de verificação de escritores. Inicialmente tratamos de algumas questões, como o desempenho dos descritores e parâmetros do sistema escritor-independente. Observamos outras questões importantes relacionadas com a representação dissimilaridade, tais como o impacto do numero de referencias utilizadas para verificação e identificação de escritores, e o número de escritores empregados no conjunto de treinamento. A partir destes primeiros experimentos, foi possível verificar que o número de escritores no conjunto de treinamento impactava menos que se supunha no desempenho do sistema. Para verificar todos estes objetivos, realizamos experimentos com duas diferentes bases de dados: BFL (Brazilian Forensic Letter Database) e IAM (Institut fur Informatik und angewandte Mathematik), as quais são manuscritas em diferentes línguas e contendo números de escritores díspares. Em sequencia, comparamos a abordagem baseada na dissimilaridade com outras estratégias escritor-dependente. Em uma segunda etapa de experimentos avaliamos o impacto de diferentes estilos de escrita, assim como: texto-dependente, texto-independente, caixa alta e falsificação (escrita dissimulada). Para isso, utilizamos a base Firemaker a qual e a única base pública a possuir estes quatro diferentes estilos. Por fim avaliamos a abordagem de seleção de escritores a qual tem por finalidade selecionar escritores para geração de modelos robustos. Através de uma serie de experimentos, percebemos que ambos os descritores de textura LBP e LPQ são capazes de superar os resultados anteriores descritos na literatura para o problema de verificação por cerca de 5 pontos percentuais. Para o problema de identificação de escritores, o uso do descritor LPQ foi capaz de alcançar melhores taxas de acertos globais, 96,7 % e 99,2 % para as bases BFL e IAM, respectivamente. Com relação aos diferentes estilos de escrita, notamos que a abordagem apresenta-se robusta para diferentes estilos incluindo a falsificação, apresentando desempenho superior aos descritos em literatura. Por fim, utilizando a abordagem de seleção de escritores, foi possível alcançar desempenho igual ou superior utilizando cerca de 50% dos escritores disponíveis no conjunto de treinamento
    corecore