24,499 research outputs found

    Lip Reading with Hahn Convolutional Neural Networks moments

    Get PDF
    International audienceLipreading or Visual speech recognition is the process of decoding speech from speakers mouth movements. It is used for people with hearing impairment , to understand patients attained with laryngeal cancer, people with vocal cord paralysis and in noisy environment. In this paper we aim to develop a visual-only speech recognition system based only on video. Our main targeted application is in the medical field for the assistance to la-ryngectomized persons. To that end, we propose Hahn Convolutional Neu-ral Network (HCNN), a novel architecture based on Hahn moments as first layer in the Convolutional neural network (CNN) architecture. We show that HCNN helps in reducing the dimensionality of video images, in gaining training time. HCNN model is trained to classify letters, digits or words given as video images. We evaluated the proposed method on three datasets, AVLetters, OuluVS2 and BBC LRW, and we show that it achieves significant results in comparison with other works in the literature

    Learning to detect chest radiographs containing lung nodules using visual attention networks

    Get PDF
    Machine learning approaches hold great potential for the automated detection of lung nodules in chest radiographs, but training the algorithms requires vary large amounts of manually annotated images, which are difficult to obtain. Weak labels indicating whether a radiograph is likely to contain pulmonary nodules are typically easier to obtain at scale by parsing historical free-text radiological reports associated to the radiographs. Using a repositotory of over 700,000 chest radiographs, in this study we demonstrate that promising nodule detection performance can be achieved using weak labels through convolutional neural networks for radiograph classification. We propose two network architectures for the classification of images likely to contain pulmonary nodules using both weak labels and manually-delineated bounding boxes, when these are available. Annotated nodules are used at training time to deliver a visual attention mechanism informing the model about its localisation performance. The first architecture extracts saliency maps from high-level convolutional layers and compares the estimated position of a nodule against the ground truth, when this is available. A corresponding localisation error is then back-propagated along with the softmax classification error. The second approach consists of a recurrent attention model that learns to observe a short sequence of smaller image portions through reinforcement learning. When a nodule annotation is available at training time, the reward function is modified accordingly so that exploring portions of the radiographs away from a nodule incurs a larger penalty. Our empirical results demonstrate the potential advantages of these architectures in comparison to competing methodologies
    corecore