11,269 research outputs found
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of
visual scenes. Our work analyzes the role of motifs: regularly appearing
substructures in scene graphs. We present new quantitative insights on such
repeated structures in the Visual Genome dataset. Our analysis shows that
object labels are highly predictive of relation labels but not vice-versa. We
also find that there are recurring patterns even in larger subgraphs: more than
50% of graphs contain motifs involving at least two relations. Our analysis
motivates a new baseline: given object detections, predict the most frequent
relation between object pairs with the given labels, as seen in the training
set. This baseline improves on the previous state-of-the-art by an average of
3.6% relative improvement across evaluation settings. We then introduce Stacked
Motif Networks, a new architecture designed to capture higher order motifs in
scene graphs that further improves over our strong baseline by an average 7.1%
relative gain. Our code is available at github.com/rowanz/neural-motifs.Comment: CVPR 2018 camera read
Attribute-Graph: A Graph based approach to Image Ranking
We propose a novel image representation, termed Attribute-Graph, to rank
images by their semantic similarity to a given query image. An Attribute-Graph
is an undirected fully connected graph, incorporating both local and global
image characteristics. The graph nodes characterise objects as well as the
overall scene context using mid-level semantic attributes, while the edges
capture the object topology. We demonstrate the effectiveness of
Attribute-Graphs by applying them to the problem of image ranking. We benchmark
the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets,
which we have created in order to evaluate the ranking performance on complex
queries containing multiple objects. Our experimental evaluation shows that
modelling images as Attribute-Graphs results in improved ranking performance
over existing techniques.Comment: In IEEE International Conference on Computer Vision (ICCV) 201
Abstraction and cartographic generalization of geographic user-generated content: use-case motivated investigations for mobile users
On a daily basis, a conventional internet user queries different internet services (available on different platforms) to gather information and make decisions. In most cases, knowingly or not, this user consumes data that has been generated by other internet users about his/her topic of interest (e.g. an ideal holiday destination with a family traveling by a van for 10 days). Commercial service providers, such as search engines, travel booking websites, video-on-demand providers, food takeaway mobile apps and the like, have found it useful to rely on the data provided by other users who have commonalities with the querying user. Examples of commonalities are demography, location, interests, internet address, etc. This process has been in practice for more than a decade and helps the service providers to tailor their results based on the collective experience of the contributors. There has been also interest in the different research communities (including GIScience) to analyze and understand the data generated by internet users.
The research focus of this thesis is on finding answers for real-world problems in which a user interacts with geographic information. The interactions can be in the form of exploration, querying, zooming and panning, to name but a few. We have aimed our research at investigating the potential of using geographic user-generated content to provide new ways of preparing and visualizing these data. Based on different scenarios that fulfill user needs, we have investigated the potential of finding new visual methods relevant to each scenario. The methods proposed are mainly based on pre-processing and analyzing data that has been offered by data providers (both commercial and non-profit organizations). But in all cases, the contribution of the data was done by ordinary internet users in an active way (compared to passive data collections done by sensors).
The main contributions of this thesis are the proposals for new ways of abstracting geographic information based on user-generated content contributions. Addressing different use-case scenarios and based on different input parameters, data granularities and evidently geographic scales, we have provided proposals for contemporary users (with a focus on the users of location-based services, or LBS). The findings are based on different methods such as
semantic analysis, density analysis and data enrichment. In the case of realization of the findings of this dissertation, LBS users will benefit from the findings by being able to explore large amounts of geographic information in more abstract and aggregated ways and get their results based on the contributions of other users. The research outcomes can be classified in the intersection between cartography, LBS and GIScience. Based on our first use case we have
proposed the inclusion of an extended semantic measure directly in the classic map generalization process. In our second use case we have focused on simplifying geographic data depiction by reducing the amount of information using a density-triggered method. And finally, the third use case was focused on summarizing and visually representing relatively large amounts of information by depicting geographic objects matched to the salient topics
emerged from the data
MirBot: A collaborative object recognition system for smartphones using convolutional neural networks
MirBot is a collaborative application for smartphones that allows users to
perform object recognition. This app can be used to take a photograph of an
object, select the region of interest and obtain the most likely class (dog,
chair, etc.) by means of similarity search using features extracted from a
convolutional neural network (CNN). The answers provided by the system can be
validated by the user so as to improve the results for future queries. All the
images are stored together with a series of metadata, thus enabling a
multimodal incremental dataset labeled with synset identifiers from the WordNet
ontology. This dataset grows continuously thanks to the users' feedback, and is
publicly available for research. This work details the MirBot object
recognition system, analyzes the statistics gathered after more than four years
of usage, describes the image classification methodology, and performs an
exhaustive evaluation using handcrafted features, convolutional neural codes
and different transfer learning techniques. After comparing various models and
transformation methods, the results show that the CNN features maintain the
accuracy of MirBot constant over time, despite the increasing number of new
classes. The app is freely available at the Apple and Google Play stores.Comment: Accepted in Neurocomputing, 201
Semantic Brokering of Multimedia Contents for Smart Delivery of Ubiquitous Services in Pervasive Environments
With the proliferation of modern mobile devices having the capability to interact each other and with the environment in a transparent manner, there is an increase in the development of those applications that are specifically designed for pervasive and ubiquitous environments. Those applications are able to provide a service of interest for the user that depends on context information, such as the user's position, his preferences, the capability of the device and its available resources. Services have to respond in a rational way in many different situations choosing the actions with the best expected result by the user, so making environment not only more connected and efficient, but smarter. Here we present a semantic framework that provides the technology for the development of intelligent, context aware services and their delivery in pervasive and ubiquitous environments
- …