15 research outputs found

    Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects

    Get PDF
    Hyperspectral Imaging (HSI) has been extensively utilized in many real-life applications because it benefits from the detailed spectral information contained in each pixel. Notably, the complex characteristics i.e., the nonlinear relation among the captured spectral information and the corresponding object of HSI data make accurate classification challenging for traditional methods. In the last few years, Deep Learning (DL) has been substantiated as a powerful feature extractor that effectively addresses the nonlinear problems that appeared in a number of computer vision tasks. This prompts the deployment of DL for HSI classification (HSIC) which revealed good performance. This survey enlists a systematic overview of DL for HSIC and compared state-of-the-art strategies of the said topic. Primarily, we will encapsulate the main challenges of traditional machine learning for HSIC and then we will acquaint the superiority of DL to address these problems. This survey breakdown the state-of-the-art DL frameworks into spectral-features, spatial-features, and together spatial-spectral features to systematically analyze the achievements (future research directions as well) of these frameworks for HSIC. Moreover, we will consider the fact that DL requires a large number of labeled training examples whereas acquiring such a number for HSIC is challenging in terms of time and cost. Therefore, this survey discusses some strategies to improve the generalization performance of DL strategies which can provide some future guidelines

    Learning Semantic Information from Multimodal Data using Deep Neural Networks

    Get PDF
    During the last decades, most collective information has been digitized to form an immense database distributed across the Internet. This can also be referred to as Big data, a collection of data that is vast in volume and still growing with time. Nowadays, we can say that Big data is everywhere. We might not even realize how much it affects our daily life as it is applied in many ways, ranging from online shopping, music streaming, TV streaming, travel and transportation, energy, fighting crime, to health care. Many organizations and companies have been collecting and analyzing large volumes of data to solve domain-specific problems or making business decisions. One of the powerful tools that can be used to extract value from Big data is Deep learning, a type of machine learning algorithm inspired by the structure and function of the human brain called artificial neural networks that learn from large amounts of data. Deep learning has been widely used and applied in many research fields such as natural language processing, IoT applications, and computer vision. In this thesis, we introduce three Deep Neural Networks that used to learn semantic information from different types of data and a design guideline to accelerate Neural Network Layer on a general propose computing platform. First, we focus on the text type data. We proposed a new feature extraction technique to preprocess the dataset and optimize the original Restricted Boltzmann Machine (RBM) model to generate the more meaningful topic that better represents the given document. Our proposed method can improve the generated topic accuracy by up to 12.99% on Open Movie, Reuters, and 20NewsGroup datasets. Moving from text to image type data and with additional click locations, we proposed a human in a loop automatic image labeling framework focusing on aerial images with fewer features for detection. The proposed model consists of two main parts, a prediction model and an adjustment model. The user first provides click locations to the prediction model to generate a bounding box of a specific object. The bounding box is then fine-tuned by the adjustment model for more accurate size and location. A feedback and retrain mechanism is implemented that allows the users to manually adjust the generated bounding box and provide feedback to incrementally train the adjustment network during runtime. This unique online learning feature enables the user to generalize the existing model to target classes not initially presented in the training set, and gradually improves the specificity of the model to those new targets during online learning. Combining text and image type data, we proposed a Multi-region Attention-assisted Grounding network (MAGNet) framework that utilizes spatial attention networks for image-level visual-textual fusion preserving local (word) and global (phrase) information to refine region proposals with an in-network Region Proposal Network (RPN) and detect single or multiple regions for a phrase query. Our framework is independent of external proposal generation systems and without additional information, it can develop an understanding of the query phrase in relation to the image to achieve respectable results in Flickr30k entities and 12% improvement over the state-of-the-art in ReferIt game. Additionally, our model is capable of grounding multiple regions for a query phrase, which is more suitable for real-life applications. Although Deep neural networks (DNNs) have become a powerful tool, it is highly expensive in both computational time and storage cost. To optimize and improve the performance of the network while maintaining the accuracy, the block-circulant matrix-based (BCM) algorithm has been introduced. It has been proven to be highly effective when implemented using customized hardware, such as FPGAs. However, its performance suffers on general purpose computing platforms. In certain cases, using the BCM does not improve the total computation time of the networks at all. With this problem, we proposed a parallel implementation of the BCM layer, and guidelines that generally lead to better implementation practice is provided. The guidelines run across popular implementation language and packages including Python, numpy, intel-numpy, tensorflow, and nGraph

    New approaches to interactive multimedia content retrieval from different sources

    Get PDF
    Mención Internacional en el título de doctorInteractive Multimodal Information Retrieval systems (IMIR) increase the capabilities of traditional search systems with the ability to retrieve information in different types (modes) and from different sources. The increase in online content while diversifying means of access to information (phones, tablets, smart watches) encourages the growing need for this type of system. In this thesis a formal model for describing interactive multimodal information retrieval systems querying various information retrieval engines has been defined. This model includes formal and widespread definition of each component of an IMIR system, namely: multimodal information organized in collections, multimodal query, different retrieval engines, a source management system (handler), a results management module (fusion) and user interactions. This model has been validated in two stages. The first, in a use case focused on information retrieval on sports. A prototype that implements a subset of the features of the model has been developed: a multimodal collection that is semantically related, three types of multimodal queries (text, audio and text + image), six different retrieval engines (question answering, full-text search, search based on ontologies, OCR in image, object detection in image and audio transcription), a strategy for source selection based on rules defined by experts, a strategy of combining results and recording of user interactions. NDCG (normalized discounted cumulative gain) has been used for comparing the results obtained for each retrieval engine. These results are: 10,1% (Question answering), 80% (full text search) and 26;8% (ontology search). These results are on the order of works of the state of art considering forums like CLEF. When the retrieval engine combination is used, the information retrieval performance increases by a percentage gain of 771,4% with question answering, 7,2% with full text search and 145,5% with Ontology search. The second scenario is focused on a prototype retrieving information from social media in the health domain. A prototype has been developed which is based on the proposed model and integrates health domain social media user-generated information, knowledge bases, query, retrieval engines, sources selection module, results' combination module and GUI. In addition, the documents included in the retrieval system have been previously processed by a process that extracts semantic information in health domain. In addition, several adaptation techniques applied to the retrieval functionality of an IMIR system have been defined by analyzing past interactions using decision trees, neural networks and clusters. After modifying the sources selection strategy (handler), the system has been reevaluated using classification techniques. The same queries and relevance judgments done by users in the sports domain prototype will be used for this evaluation. This evaluation compares the normalized discounted cumulative gain (NDCG) measure obtained with two different approaches: the multimodal system using predefined rules and the same multimodal system once the functionality is adapted by past user interactions. The NDCG has shown an improvement between -2,92% and 2,81% depending on the approaches used. We have considered three features to classify the approaches: (i) the classification algorithm; (ii) the query features; and (iii) the scores for computing the orders of retrieval engines. The best result is obtained using probabilities-based classification algorithm, the retrieval engines ranking generated with Averaged-Position score and the mode, type, length and entities of the query. Its NDCG value is 81,54%.Los Sistemas Interactivos de Recuperación de Información Multimodal (IMIR) incrementan las capacidades de los sistemas tradicionales de búsqueda con la posibilidad de recuperar información de diferentes tipos (modos) y a partir de diferentes fuentes. El incremento del contenido en internet a la vez que la diversificación de los medios de acceso a la información (móviles, tabletas, relojes inteligentes) fomenta la necesidad cada vez mayor de este tipo de sistemas. En esta tesis se ha definido un modelo formal para la descripción de sistemas de recuperación de información multimodal e interactivos que consultan varios motores de recuperación. Este modelo incluye la definición formal y generalizada de cada componente de un sistema IMIR, a saber: información multimodal organizada en colecciones, consulta multimodal, diferentes motores de recuperación, sistema de gestión de fuentes (handler), módulo de gestión de resultados (fusión) y las interacciones de los usuarios. Este modelo se ha validado en dos escenarios. El primero, en un caso de uso focalizado en recuperación de información relativa a deportes. Se ha desarrollado un prototipo que implementa un subconjunto de todas las características del modelo: una colección multimodal que se relaciona semánticamente, tres tipos de consultas multimodal (texto, audio y texto + imagen), seis motores diferentes de recuperación (búsqueda de respuestas, búsqueda de texto completo, búsqueda basada en ontologías, OCR en imagen, detección de objetos en imagen y transcripción de audio), una estrategia de selección de fuentes basada en reglas definidas por expertos, una estrategia de combinación de resultados y el registro de las interacciones. Se utiliza la medida NDCG (normalized discounted cumulative gain) para describir los resultados obtenidos por cada motor de recuperación. Estos resultados son: 10,1% (Question Answering), 80% (Búsqueda a texto completo) y 26,8% (Búsqueda en ontologías). Estos resultados están en el orden de los trabajos del estado de arte considerando foros como CLEF (Cross-Language Evaluation Forum). Cuando se utiliza la combinación de motores de recuperación, el rendimiento de recuperación de información se incrementa en un porcentaje de ganancia de 771,4% con Question Answering, 7,2% con Búsqueda a texto completo y 145,5% con Búsqueda en ontologías. El segundo escenario es un prototipo centrado en recuperación de información de medios sociales en el dominio de salud. Se ha desarrollado un prototipo basado en el modelo propuesto y que integra información del dominio de salud generada por el usuario en medios sociales, bases de conocimiento, consulta, motores de recuperación, módulo de selección de fuentes, módulo de combinación de resultados y la interfaz gráfica de usuario. Además, los documentos incluidos en el sistema de recuperación han sido previamente anotados mediante un proceso de extracción de información semántica del dominio de salud. Además, se han definido técnicas de adaptación de la funcionalidad de recuperación de un sistema IMIR analizando interacciones pasadas mediante árboles de decisión, redes neuronales y agrupaciones. Una vez modificada la estrategia de selección de fuentes (handler), se ha evaluado de nuevo el sistema usando técnicas de clasificación. Las mismas consultas y juicios de relevancia realizadas por los usuarios en el primer prototipo sobre deportes se han utilizado para esta evaluación. La evaluación compara la medida NDCG (normalized discounted cumulative gain) obtenida con dos enfoques diferentes: el sistema multimodal usando reglas predefinidas y el mismo sistema multimodal una vez que la funcionalidad se ha adaptado por las interacciones de usuario. El NDCG ha mostrado una mejoría entre -2,92% y 2,81% en función de los métodos utilizados. Hemos considerado tres características para clasificar los enfoques: (i) el algoritmo de clasificación; (ii) las características de la consulta; y (iii) las puntuaciones para el cálculo del orden de los motores de recuperación. El mejor resultado se obtiene utilizando el algoritmo de clasificación basado en probabilidades, las puntuaciones para los motores de recuperación basados en la media de la posición del primer resultado relevante y el modo, el tipo, la longitud y las entidades de la consulta. Su valor de NDCG es 81,54%.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Ana García Serrano.- Secretario: María Belén Ruiz Mezcua.- Vocal: Davide Buscald

    Remote Sensing of Natural Hazards

    Get PDF
    Each year, natural hazards such as earthquakes, cyclones, flooding, landslides, wildfires, avalanches, volcanic eruption, extreme temperatures, storm surges, drought, etc., result in widespread loss of life, livelihood, and critical infrastructure globally. With the unprecedented growth of the human population, largescale development activities, and changes to the natural environment, the frequency and intensity of extreme natural events and consequent impacts are expected to increase in the future.Technological interventions provide essential provisions for the prevention and mitigation of natural hazards. The data obtained through remote sensing systems with varied spatial, spectral, and temporal resolutions particularly provide prospects for furthering knowledge on spatiotemporal patterns and forecasting of natural hazards. The collection of data using earth observation systems has been valuable for alleviating the adverse effects of natural hazards, especially with their near real-time capabilities for tracking extreme natural events. Remote sensing systems from different platforms also serve as an important decision-support tool for devising response strategies, coordinating rescue operations, and making damage and loss estimations.With these in mind, this book seeks original contributions to the advanced applications of remote sensing and geographic information systems (GIS) techniques in understanding various dimensions of natural hazards through new theory, data products, and robust approaches

    Object recognition in infrared imagery using appearance-based methods

    Get PDF
    Abstract unavailable please refer to PD

    Video based detection of normal and anomalous behaviour of individuals

    Get PDF
    This PhD research has proposed novel computer vision and machine learning algorithms for the problem of video based anomalous event detection of individuals. Varieties of Hidden Markov Models were designed to model the temporal and spatial causalities of crowd behaviour. A Markov Random Field on top of a Gaussian Mixture Model is proposed to incorporate spatial context information during classification. Discriminative conditional random field methods are also proposed. Novel features are proposed to extract motion and appearance information. Most of the proposed approaches comprehensively outperform other techniques on publicly available datasets during the time of publications originating from the results

    Knowledge Modelling and Learning through Cognitive Networks

    Get PDF
    One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
    corecore