404 research outputs found

    Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

    Full text link
    We introduce a model for bidirectional retrieval of images and sentences through a multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. In addition to a ranking objective seen in previous work, this allows us to add a new fragment alignment objective that learns to directly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments significantly improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions since the inferred inter-modal fragment alignment is explicit

    Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition

    Get PDF
    Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation

    Generating Diverse and Meaningful Captions

    Get PDF
    Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.Comment: Accepted for presentation at The 27th International Conference on Artificial Neural Networks (ICANN 2018

    Complete Genome Sequence of \u3ci\u3eRickettsia parkeri\u3c/i\u3e Strain Black Gap

    Get PDF
    A unique genotype of Rickettsia parkeri, designated R. parkeri strain Black Gap, has thus far been associated exclusively with the North American tick, Dermacentor parumapertus. The compete genome consists of a single circular chromosome with 1,329,522 bp and a G+C content of 32.5%

    Web Search of Fashion Items with Multimodal Querying

    Full text link
    In this paper, we introduce a novel multimodal fashion search paradigm where e-commerce data is searched with a multimodal query composed of both an image and text. In this setting, the query image shows a fashion product that the user likes and the query text allows to change certain product attributes to fit the product to the user’s desire. Multimodal search gives users the means to clearly express what they are looking for. This is in contrast to current e-commerce search mechanisms, which are cumbersome and often fail to grasp the customer’s needs. Multimodal search requires intermodal representations of visual and textual fashion attributes which can be mixed and matched to form the user’s desired product, and which have a mechanism to indicate when a visual and textual fashion attribute represent the same concept. With a neural network, we induce a common, multimodal space for visual and textual fashion attributes where their inner product measures their semantic similarity. We build a multimodal retrieval model which operates on the obtained intermodal representations and which ranks images based on their relevance to a multimodal query. We demonstrate that our model is able to retrieve images that both exhibit the necessary query image attributes and satisfy the query texts. Moreover, we show that our model substantially outperforms two state-of-the-art retrieval models adapted to multimodal fashion search.status: accepte

    Deep learning for situational understanding

    Get PDF
    Situational understanding (SU) requires a combination of insight — the ability to accurately perceive an existing situation — and foresight — the ability to anticipate how an existing situation may develop in the future. SU involves information fusion as well as model representation and inference. Commonly, heterogenous data sources must be exploited in the fusion process: often including both hard and soft data products. In a coalition context, data and processing resources will also be distributed and subjected to restrictions on information sharing. It will often be necessary for a human to be in the loop in SU processes, to provide key input and guidance, and to interpret outputs in a way that necessitates a degree of transparency in the processing: systems cannot be “black boxes”. In this paper, we characterize the Coalition Situational Understanding (CSU) problem in terms of fusion, temporal, distributed, and human requirements. There is currently significant interest in deep learning (DL) approaches for processing both hard and soft data. We analyze the state-of-the-art in DL in relation to these requirements for CSU, and identify areas where there is currently considerable promise, and key gaps

    Clean Heat and Energy Efficiency Workforce Assessment

    Get PDF
    Abstract - To meet the Scottish Government’s ambitious climate change targets, there will need to be a significant increase in the deployment of energy efficiency and low carbon heat measures in domestic and non-domestic buildings in the next decade. To deliver this, the supply chain in Scotland needs to be fit-for-purpose in terms of having the capacity and skills to deliver this scale of technology deployment. This research explores current and future workforce capabilities around energy efficiency and low carbon heat technologies. It reviews the current capabilities and skills along the supply chain of the energy efficiency and low carbon heating technologies in Scotland, identifies the skills gaps and analyses the potential options to fill these gaps to meet the targets set out in the Heat in Buildings Strategy
    corecore