150 research outputs found

    Discrete language models for video retrieval

    Get PDF
    Finding relevant video content is important for producers of television news, documentanes and commercials. As digital video collections become more widely available, content-based video retrieval tools will likely grow in importance for an even wider group of users. In this thesis we investigate language modelling approaches, that have been the focus of recent attention within the text information retrieval community, for the video search task. Language models are smoothed discrete generative probability distributions generally of text and provide a neat information retrieval formalism that we believe is equally applicable to traditional visual features as to text. We propose to model colour, edge and texture histogrambased features directly with discrete language models and this approach is compatible with further traditional visual feature representations. We provide a comprehensive and robust empirical study of smoothing methods, hierarchical semantic and physical structures, and fusion methods for this language modelling approach to video retrieval. The advantage of our approach is that it provides a consistent, effective and relatively efficient model for video retrieval

    Dublin City University at CLEF 2006: Experiments for the ImageCLEF Photo Collection Standard Ad Hoc Task

    Get PDF
    We provide a technical description of our submission to the CLEF 2006 Cross Language Image Retrieval(ImageCLEF) Photo Collection Standard Ad Hoc task. We performed monolingual and cross language retrieval of photo images using photo annotations with and without feedback, and also a combined visual and text retrieval approach. Topics are translated into English using the Babelfish online machine translation system. Our text runs used the BM25 algorithm, while our visual approach used simple low-level features with matching based on the Jeffrey Divergence measure. Our results consistently indicate that the fusion of text and visual features is best for this task, and that performing feedback for text consistently improves on the baseline non-feedback BM25 text runs for all language pairs

    Computational Diagnosis of Skin Lesions from Dermoscopic Images using Combined Features

    Get PDF
    There has been an alarming increase in the number of skin cancer cases worldwide in recent years, which has raised interest in computational systems for automatic diagnosis to assist early diagnosis and prevention. Feature extraction to describe skin lesions is a challenging research area due to the difficulty in selecting meaningful features. The main objective of this work is to find the best combination of features, based on shape properties, colour variation and texture analysis, to be extracted using various feature extraction methods. Several colour spaces are used for the extraction of both colour- and texture-related features. Different categories of classifiers were adopted to evaluate the proposed feature extraction step, and several feature selection algorithms were compared for the classification of skin lesions. The developed skin lesion computational diagnosis system was applied to a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by an optimum-path forest classifier with very promising results. The proposed system achieved an accuracy of 92.3%, sensitivity of 87.5% and specificity of 97.1% when the full set of features was used. Furthermore, it achieved an accuracy of 91.6%, sensitivity of 87% and specificity of 96.2%, when 50 features were selected using a correlation-based feature selection algorithm

    An investigation of a human in the loop approach to object recognition

    Get PDF
    For several decades researchers around the globe have been avidly investigating practical solutions to the enduring problem of understanding visual content within an image. One might think of the quest as an effort to emulate human visual system. Despite all the endeavours, the simplest of visual tasks to us humans, such as optical segmentation of objects, remain a significant challenge for machines. In a few occasions where a computer's processing power is adequate to accomplish the task, the issue of public alienation towards autonomous solutions to critical applications remains unresolved. The principal purpose of this thesis is to propose innovative computer vision, machine learning, and pattern recognition techniques that exploit abstract knowledge of human beings in practical models using facile yet effective methodologies. High-level information provided by users in the decision making loop of such interactive systems enhances the efficacy of vision algorithms, whilst simultaneously machines reduce users' labour by filtering results and completing mundane tasks on their behalf. In this thesis, we initially draw a vivid picture of interactive approaches to vision tasks prior to scrutinising relevant aspects of human in the loop methodologies and highlighting their current shortcomings in object recognition applications. Our survey of literature unveils that the difficulty in harnessing users' abstract knowledge is amongst major complications of human in the loop algorithms. We therefore propose two novel methodologies to capture and model such high-level sources of information. One solution builds innovative textual descriptors that are compatible with discriminative classifiers. The other is based on the random naive Bayes algorithm and is suitable for generative classification frameworks. We further investigate the infamous problem of fusing images' low-level and users' high-level information sources. Our next contribution is therefore a novel random forest based human in the loop framework that efficiently fuses visual features of images with user provided information for fast predictions and a superior classification performance. User abstract knowledge in this method is harnessed in shape of user's answers to perceptual questions about images. In contrast to generative Bayesian frameworks, this is a direct discriminative approach that enables information source fusion in the preliminary stages of the prediction process. We subsequently reveal inventive generative frameworks that model each source of information individually and determine the most effective for the purpose of class label prediction. We propose two innovative and intelligent human in the loop fusion algorithms. Our first algorithm is a modified naive Bayes greedy technique, while our second solution is based on a feedforward neural network. Through experiments on a variety of datasets, we show that our novel intelligent fusion methods of information source selection outperform their competitors in tasks of fine-grained visual categorisation. We additionally present methodologies to reduce unnecessary human involvement in mundane tasks by only focusing on cases where their invaluable abstract knowledge is of utter importance. Our proposed algorithm is based on information theory and recent image annotation techniques. It determines the most efficient sequence of information to obtain from humans involved in the decision making loop, in order to minimise their unnecessary engagement in routine tasks. This approach allows them to be concerned with more abstract functions instead. Our experimental results reveal faster achievement of peak performance in contrast to alternative random ranking systems. Our final major contribution in this thesis is a novel remedy for the curse of dimensionality in pattern recognition problems. It is theoretically based on mutual information and Fano's inequality. Our approach separates the most discriminative descriptors and has the capability to enhance the accuracy of classification algorithms. The process of selecting a subset of relevant features is vital for designing robust human in the loop vision models. Our selection techniques eliminate redundant and irrelevant visual and textual features, and therefore its influence on improvement of various human in the loop algorithms proves to be fundamental in our experiments

    An investigation of a human in the loop approach to object recognition

    Get PDF
    For several decades researchers around the globe have been avidly investigating practical solutions to the enduring problem of understanding visual content within an image. One might think of the quest as an effort to emulate human visual system. Despite all the endeavours, the simplest of visual tasks to us humans, such as optical segmentation of objects, remain a significant challenge for machines. In a few occasions where a computer's processing power is adequate to accomplish the task, the issue of public alienation towards autonomous solutions to critical applications remains unresolved. The principal purpose of this thesis is to propose innovative computer vision, machine learning, and pattern recognition techniques that exploit abstract knowledge of human beings in practical models using facile yet effective methodologies. High-level information provided by users in the decision making loop of such interactive systems enhances the efficacy of vision algorithms, whilst simultaneously machines reduce users' labour by filtering results and completing mundane tasks on their behalf. In this thesis, we initially draw a vivid picture of interactive approaches to vision tasks prior to scrutinising relevant aspects of human in the loop methodologies and highlighting their current shortcomings in object recognition applications. Our survey of literature unveils that the difficulty in harnessing users' abstract knowledge is amongst major complications of human in the loop algorithms. We therefore propose two novel methodologies to capture and model such high-level sources of information. One solution builds innovative textual descriptors that are compatible with discriminative classifiers. The other is based on the random naive Bayes algorithm and is suitable for generative classification frameworks. We further investigate the infamous problem of fusing images' low-level and users' high-level information sources. Our next contribution is therefore a novel random forest based human in the loop framework that efficiently fuses visual features of images with user provided information for fast predictions and a superior classification performance. User abstract knowledge in this method is harnessed in shape of user's answers to perceptual questions about images. In contrast to generative Bayesian frameworks, this is a direct discriminative approach that enables information source fusion in the preliminary stages of the prediction process. We subsequently reveal inventive generative frameworks that model each source of information individually and determine the most effective for the purpose of class label prediction. We propose two innovative and intelligent human in the loop fusion algorithms. Our first algorithm is a modified naive Bayes greedy technique, while our second solution is based on a feedforward neural network. Through experiments on a variety of datasets, we show that our novel intelligent fusion methods of information source selection outperform their competitors in tasks of fine-grained visual categorisation. We additionally present methodologies to reduce unnecessary human involvement in mundane tasks by only focusing on cases where their invaluable abstract knowledge is of utter importance. Our proposed algorithm is based on information theory and recent image annotation techniques. It determines the most efficient sequence of information to obtain from humans involved in the decision making loop, in order to minimise their unnecessary engagement in routine tasks. This approach allows them to be concerned with more abstract functions instead. Our experimental results reveal faster achievement of peak performance in contrast to alternative random ranking systems. Our final major contribution in this thesis is a novel remedy for the curse of dimensionality in pattern recognition problems. It is theoretically based on mutual information and Fano's inequality. Our approach separates the most discriminative descriptors and has the capability to enhance the accuracy of classification algorithms. The process of selecting a subset of relevant features is vital for designing robust human in the loop vision models. Our selection techniques eliminate redundant and irrelevant visual and textual features, and therefore its influence on improvement of various human in the loop algorithms proves to be fundamental in our experiments

    IMAGE RETRIEVAL BASED ON COMPLEX DESCRIPTIVE QUERIES

    Get PDF
    The amount of visual data such as images and videos available over web has increased exponentially over the last few years. In order to efficiently organize and exploit these massive collections, a system, apart from being able to answer simple classification based questions such as whether a specific object is present (or absent) in an image, should also be capable of searching images and videos based on more complex descriptive questions. There is also a considerable amount of structure present in the visual world which, if effectively utilized, can help achieve this goal. To this end, we first present an approach for image ranking and retrieval based on queries consisting of multiple semantic attributes. We further show that there are significant correlations present between these attributes and accounting for them can lead to superior performance. Next, we extend this by proposing an image retrieval framework for descriptive queries composed of object categories, semantic attributes and spatial relationships. The proposed framework also includes a unique multi-view hashing technique, which enables query specification in three different modalities - image, sketch and text. We also demonstrate the effectiveness of leveraging contextual information to reduce the supervision requirements for learning object and scene recognition models. We present an active learning framework to simultaneously learn appearance and contextual models for scene understanding. Within this framework we introduce new kinds of labeling questions that are designed to collect appearance as well as contextual information and which mimic the way in which humans actively learn about their environment. Furthermore we explicitly model the contextual interactions between the regions within an image and select the question which leads to the maximum reduction in the combined entropy of all the regions in the image (image entropy)

    Automatic analysis of retinal images to aid in the diagnosis and grading of diabetic retinopathy

    Get PDF
    Diabetic retinopathy (DR) is the most common complication of diabetes mellitus and one of the leading causes of preventable blindness in the adult working population. Visual loss can be prevented from the early stages of DR, when the treatments are effective. Therefore, early diagnosis is paramount. However, DR may be clinically asymptomatic until the advanced stage, when vision is already affected and treatment may become difficult. For this reason, diabetic patients should undergo regular eye examinations through screening programs. Traditionally, DR screening programs are run by trained specialists through visual inspection of the retinal images. However, this manual analysis is time consuming and expensive. With the increasing incidence of diabetes and the limited number of clinicians and sanitary resources, the early detection of DR becomes non-viable. For this reason, computed-aided diagnosis (CAD) systems are required to assist specialists for a fast, reliable diagnosis, allowing to reduce the workload and the associated costs. We hypothesize that the application of novel, automatic algorithms for fundus image analysis could contribute to the early diagnosis of DR. Consequently, the main objective of the present Doctoral Thesis is to study, design and develop novel methods based on the automatic analysis of fundus images to aid in the screening, diagnosis, and treatment of DR. In order to achieve the main goal, we built a private database and used five retinal public databases: DRIMDB, DIARETDB1, DRIVE, Messidor and Kaggle. The stages of fundus image processing covered in this Thesis are: retinal image quality assessment (RIQA), the location of the optic disc (OD) and the fovea, the segmentation of RLs and EXs, and the DR severity grading. RIQA was studied with two different approaches. The first approach was based on the combination of novel, global features. Results achieved 91.46% accuracy, 92.04% sensitivity, and 87.92% specificity using the private database. We developed a second approach aimed at RIQA based on deep learning. We achieved 95.29% accuracy with the private database and 99.48% accuracy with the DRIMDB database. The location of the OD and the fovea was performed using a combination of saliency maps. The proposed methods were evaluated over the private database and the public databases DRIVE, DIARETDB1 and Messidor. For the OD, we achieved 100% accuracy for all databases except Messidor (99.50%). As for the fovea location, we also reached 100% accuracy for all databases except Messidor (99.67%). The joint segmentation of RLs and EXs was accomplished by decomposing the fundus image into layers. Results were computed per pixel and per image. Using the private database, 88.34% per-image accuracy (ACCi) was reached for the RL detection and 95.41% ACCi for EX detection. An additional method was proposed for the segmentation of RLs based on superpixels. Evaluating this method with the private database, we obtained 84.45% ACCi. Results were validated using the DIARETDB1 database. Finally, we proposed a deep learning framework for the automatic DR severity grading. The method was based on a novel attention mechanism which performs a separate attention of the dark and the bright structures of the retina. The Kaggle DR detection dataset was used for development and validation. The International Clinical DR Scale was considered, which is made up of 5 DR severity levels. Classification results for all classes achieved 83.70% accuracy and a Quadratic Weighted Kappa of 0.78. The methods proposed in this Doctoral Thesis form a complete, automatic DR screening system, contributing to aid in the early detection of DR. In this way, diabetic patients could receive better attention for their ocular health avoiding vision loss. In addition, the workload of specialists could be relieved while healthcare costs are reduced.La retinopatía diabética (RD) es la complicación más común de la diabetes mellitus y una de las principales causas de ceguera prevenible en la población activa adulta. El diagnóstico precoz es primordial para prevenir la pérdida visual. Sin embargo, la RD es clínicamente asintomática hasta etapas avanzadas, cuando la visión ya está afectada. Por eso, los pacientes diabéticos deben someterse a exámenes oftalmológicos periódicos a través de programas de cribado. Tradicionalmente, estos programas están a cargo de especialistas y se basan de la inspección visual de retinografías. Sin embargo, este análisis manual requiere mucho tiempo y es costoso. Con la creciente incidencia de la diabetes y la escasez de recursos sanitarios, la detección precoz de la RD se hace inviable. Por esta razón, se necesitan sistemas de diagnóstico asistido por ordenador (CAD) que ayuden a los especialistas a realizar un diagnóstico rápido y fiable, que permita reducir la carga de trabajo y los costes asociados. El objetivo principal de la presente Tesis Doctoral es estudiar, diseñar y desarrollar nuevos métodos basados en el análisis automático de retinografías para ayudar en el cribado, diagnóstico y tratamiento de la RD. Las etapas estudiadas fueron: la evaluación de la calidad de la imagen retiniana (RIQA), la localización del disco óptico (OD) y la fóvea, la segmentación de RL y EX y la graduación de la severidad de la RD. RIQA se estudió con dos enfoques diferentes. El primer enfoque se basó en la combinación de características globales. Los resultados lograron una precisión del 91,46% utilizando la base de datos privada. El segundo enfoque se basó en aprendizaje profundo. Logramos un 95,29% de precisión con la base de datos privada y un 99,48% con la base de datos DRIMDB. La localización del OD y la fóvea se realizó mediante una combinación de mapas de saliencia. Los métodos propuestos fueron evaluados sobre la base de datos privada y las bases de datos públicas DRIVE, DIARETDB1 y Messidor. Para el OD, logramos una precisión del 100% para todas las bases de datos excepto Messidor (99,50%). En cuanto a la ubicación de la fóvea, también alcanzamos un 100% de precisión para todas las bases de datos excepto Messidor (99,67%). La segmentación conjunta de RL y EX se logró descomponiendo la imagen del fondo de ojo en capas. Utilizando la base de datos privada, se alcanzó un 88,34% de precisión por imagen (ACCi) para la detección de RL y un 95,41% de ACCi para la detección de EX. Se propuso un método adicional para la segmentación de RL basado en superpíxeles. Evaluando este método con la base de datos privada, obtuvimos 84.45% ACCi. Los resultados se validaron utilizando la base de datos DIARETDB1. Finalmente, propusimos un método de aprendizaje profundo para la graduación automática de la gravedad de la DR. El método se basó en un mecanismo de atención. Se utilizó la base de datos Kaggle y la Escala Clínica Internacional de RD (5 niveles de severidad). Los resultados de clasificación para todas las clases alcanzaron una precisión del 83,70% y un Kappa ponderado cuadrático de 0,78. Los métodos propuestos en esta Tesis Doctoral forman un sistema completo y automático de cribado de RD, contribuyendo a ayudar en la detección precoz de la RD. De esta forma, los pacientes diabéticos podrían recibir una mejor atención para su salud ocular evitando la pérdida de visión. Además, se podría aliviar la carga de trabajo de los especialistas al mismo tiempo que se reducen los costes sanitarios.Escuela de DoctoradoDoctorado en Tecnologías de la Información y las Telecomunicacione
    corecore