404 research outputs found
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
We introduce a model for bidirectional retrieval of images and sentences
through a multi-modal embedding of visual and natural language data. Unlike
previous models that directly map images or sentences into a common embedding
space, our model works on a finer level and embeds fragments of images
(objects) and fragments of sentences (typed dependency tree relations) into a
common space. In addition to a ranking objective seen in previous work, this
allows us to add a new fragment alignment objective that learns to directly
associate these fragments across modalities. Extensive experimental evaluation
shows that reasoning on both the global level of images and sentences and the
finer level of their respective fragments significantly improves performance on
image-sentence retrieval tasks. Additionally, our model provides interpretable
predictions since the inferred inter-modal fragment alignment is explicit
Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition
Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation
Generating Diverse and Meaningful Captions
Image Captioning is a task that requires models to acquire a multi-modal
understanding of the world and to express this understanding in natural
language text. While the state-of-the-art for this task has rapidly improved in
terms of n-gram metrics, these models tend to output the same generic captions
for similar images. In this work, we address this limitation and train a model
that generates more diverse and specific captions through an unsupervised
training approach that incorporates a learning signal from an Image Retrieval
model. We summarize previous results and improve the state-of-the-art on
caption diversity and novelty. We make our source code publicly available
online.Comment: Accepted for presentation at The 27th International Conference on
Artificial Neural Networks (ICANN 2018
Complete Genome Sequence of \u3ci\u3eRickettsia parkeri\u3c/i\u3e Strain Black Gap
A unique genotype of Rickettsia parkeri, designated R. parkeri strain Black Gap, has thus far been associated exclusively with the North American tick, Dermacentor parumapertus. The compete genome consists of a single circular chromosome with 1,329,522 bp and a G+C content of 32.5%
Web Search of Fashion Items with Multimodal Querying
In this paper, we introduce a novel multimodal fashion search paradigm where e-commerce data is searched with a multimodal query composed of both an image and text. In this setting, the query image shows a fashion product that the user likes and the query text allows to change certain product attributes to fit the product to the user’s desire. Multimodal search gives users the means to clearly express what they are looking for. This is in contrast to current e-commerce search mechanisms, which are cumbersome and often fail to grasp the customer’s needs. Multimodal search requires intermodal representations of visual and textual fashion attributes which can be mixed and matched to form the user’s desired product, and which have a mechanism to indicate when a visual and textual fashion attribute represent the same concept. With a neural network, we induce a common, multimodal space for visual and textual fashion attributes where their inner product measures their semantic similarity. We build a multimodal retrieval model which operates on the obtained intermodal representations and which ranks images based on their relevance to a multimodal query. We demonstrate that our model is able to retrieve images that both exhibit the necessary query image attributes and satisfy the query texts. Moreover, we show that our model substantially outperforms two state-of-the-art retrieval models adapted to multimodal fashion
search.status: accepte
Deep learning for situational understanding
Situational understanding (SU) requires a combination
of insight — the ability to accurately perceive an existing
situation — and foresight — the ability to anticipate how
an existing situation may develop in the future. SU involves
information fusion as well as model representation and inference.
Commonly, heterogenous data sources must be exploited in the
fusion process: often including both hard and soft data products.
In a coalition context, data and processing resources will also be
distributed and subjected to restrictions on information sharing.
It will often be necessary for a human to be in the loop in SU
processes, to provide key input and guidance, and to interpret
outputs in a way that necessitates a degree of transparency
in the processing: systems cannot be “black boxes”. In this
paper, we characterize the Coalition Situational Understanding
(CSU) problem in terms of fusion, temporal, distributed, and
human requirements. There is currently significant interest in
deep learning (DL) approaches for processing both hard and
soft data. We analyze the state-of-the-art in DL in relation to
these requirements for CSU, and identify areas where there is
currently considerable promise, and key gaps
Clean Heat and Energy Efficiency Workforce Assessment
Abstract - To meet the Scottish Government’s ambitious climate change targets, there will need to be a significant increase in the deployment of energy efficiency and low carbon heat measures in domestic and non-domestic buildings in the next decade. To deliver this, the supply chain in Scotland needs to be fit-for-purpose in terms of having the capacity and skills to deliver this scale of technology deployment.
This research explores current and future workforce capabilities around energy efficiency and low carbon heat technologies. It reviews the current capabilities and skills along the supply chain of the energy efficiency and low carbon heating technologies in Scotland, identifies the skills gaps and analyses the potential options to fill these gaps to meet the targets set out in the Heat in Buildings Strategy
- …