693 research outputs found
Learning Social Image Embedding with Deep Multimodal Attention Networks
Learning social media data embedding by deep models has attracted extensive
research interest as well as boomed a lot of applications, such as link
prediction, classification, and cross-modal search. However, for social images
which contain both link information and multimodal contents (e.g., text
description, and visual content), simply employing the embedding learnt from
network structure or data content results in sub-optimal social image
representation. In this paper, we propose a novel social image embedding
approach called Deep Multimodal Attention Networks (DMAN), which employs a deep
model to jointly embed multimodal contents and link information. Specifically,
to effectively capture the correlations between multimodal contents, we propose
a multimodal attention network to encode the fine-granularity relation between
image regions and textual words. To leverage the network structure for
embedding learning, a novel Siamese-Triplet neural network is proposed to model
the links among images. With the joint deep model, the learnt embedding can
capture both the multimodal contents and the nonlinear network information.
Extensive experiments are conducted to investigate the effectiveness of our
approach in the applications of multi-label classification and cross-modal
search. Compared to state-of-the-art image embeddings, our proposed DMAN
achieves significant improvement in the tasks of multi-label classification and
cross-modal search
Reachability Analysis of Graph Modelled Collections
This paper is concerned with potential recall in multimodal
information retrieval in graph-based models. We provide a framework to
leverage individuality and combination of features of different modalities
through our formulation of faceted search. We employ a potential recall
analysis on a test collection to gain insight on the corpus and further
highlight the role of multiple facets, relations between the objects, and
semantic links in recall improvement. We conduct the experiments on
a multimodal dataset containing approximately 400,000 documents and
images. We demonstrate that leveraging multiple facets increases most
notably the recall for very hard topics by up to 316%
Mobility mining for time-dependent urban network modeling
170 p.Mobility planning, monitoring and analysis in such a complex ecosystem as a city are very challenging.Our contributions are expected to be a small step forward towards a more integrated vision of mobilitymanagement. The main hypothesis behind this thesis is that the transportation offer and the mobilitydemand are greatly coupled, and thus, both need to be thoroughly and consistently represented in a digitalmanner so as to enable good quality data-driven advanced analysis. Data-driven analytics solutions relyon measurements. However, sensors do only provide a measure of movements that have already occurred(and associated magnitudes, such as vehicles per hour). For a movement to happen there are two mainrequirements: i) the demand (the need or interest) and ii) the offer (the feasibility and resources). Inaddition, for good measurement, the sensor needs to be located at an adequate location and be able tocollect data at the right moment. All this information needs to be digitalised accordingly in order to applyadvanced data analytic methods and take advantage of good digital transportation resource representation.Our main contributions, focused on mobility data mining over urban transportation networks, can besummarised in three groups. The first group consists of a comprehensive description of a digitalmultimodal transport infrastructure representation from global and local perspectives. The second groupis oriented towards matching diverse sensor data onto the transportation network representation,including a quantitative analysis of map-matching algorithms. The final group of contributions covers theprediction of short-term demand based on various measures of urban mobility
Mobility mining for time-dependent urban network modeling
170 p.Mobility planning, monitoring and analysis in such a complex ecosystem as a city are very challenging.Our contributions are expected to be a small step forward towards a more integrated vision of mobilitymanagement. The main hypothesis behind this thesis is that the transportation offer and the mobilitydemand are greatly coupled, and thus, both need to be thoroughly and consistently represented in a digitalmanner so as to enable good quality data-driven advanced analysis. Data-driven analytics solutions relyon measurements. However, sensors do only provide a measure of movements that have already occurred(and associated magnitudes, such as vehicles per hour). For a movement to happen there are two mainrequirements: i) the demand (the need or interest) and ii) the offer (the feasibility and resources). Inaddition, for good measurement, the sensor needs to be located at an adequate location and be able tocollect data at the right moment. All this information needs to be digitalised accordingly in order to applyadvanced data analytic methods and take advantage of good digital transportation resource representation.Our main contributions, focused on mobility data mining over urban transportation networks, can besummarised in three groups. The first group consists of a comprehensive description of a digitalmultimodal transport infrastructure representation from global and local perspectives. The second groupis oriented towards matching diverse sensor data onto the transportation network representation,including a quantitative analysis of map-matching algorithms. The final group of contributions covers theprediction of short-term demand based on various measures of urban mobility
The State of the Art in Multilayer Network Visualization
Modelling relationship between entities in real-world systems with a simple graph is a standard approach. However, realityis better embraced as several interdependent subsystems (or layers). Recently, the concept of a multilayer network model hasemerged from the field of complex systems. This model can be applied to a wide range of real-world data sets. Examples ofmultilayer networks can be found in the domains of life sciences, sociology, digital humanities and more. Within the domainof graph visualization, there are many systems which visualize data sets having many characteristics of multilayer graphs.This report provides a state of the art and a structured analysis of contemporary multilayer network visualization, not only forresearchers in visualization, but also for those who aim to visualize multilayer networks in the domain of complex systems, as wellas those developing systems across application domains. We have explored the visualization literature to survey visualizationtechniques suitable for multilayer graph visualization, as well as tools, tasks and analytic techniques from within applicationdomains. This report also identifies the outstanding challenges for multilayer graph visualization and suggests future researchdirections for addressing them
Recommended from our members
Semantics and statistics for automated image annotation
Automated image annotation consists of a number of techniques that aim to find the correlation between words and image features such as colour, shape, and texture to provide correct annotation words to images. In particular, approaches based on Bayesian theory use machine-learning techniques to learn statistical models from a training set of pre-annotated images and apply them to generate annotations for unseen images.
The focus of this thesis lies in demonstrating that an approach, which goes beyond learning the statistical correlation between words and visual features and also exploits information about the actual semantics of the words used in the annotation process, is able to improve the performance of probabilistic annotation systems. Specifically, I present three experiments. Firstly, I introduce a novel approach that automatically refines the annotation words generated by a non-parametric density estimation model using semantic relatedness measures. Initially, I consider semantic measures based on co-occurrence of words in the training set. However, this approach can exhibit limitations, as its performance depends on the quality and coverage provided by the training data. For this reason, I devise an alternative solution that combines semantic measures based on knowledge sources, such as WordNet and Wikipedia, with word co-occurrence in the training set and on the web, to achieve statistically significant results over the baseline. Secondly, I investigate the effect of using semantic measures inside an evaluation measure that computes the performance of an automated image annotation system, whose annotation words adopt the hierarchical structure of an ontology. This is the case of the ImageCLEF2009 collection. Finally, I propose a Markov Random Field that exploits the semantic context dependencies of the image. The best result obtains a mean average precision of 0.32, which is consistent with the state-of-the-art in automated image annotation for the Corel 5k dataset.
</br
Recommended from our members
Computational solutions for omics data
High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.National Institutes of Health (U.S.) (Grant GM081871
Recommended from our members
Exploiting multimodality and structure in world representations
An essential aim of artificial intelligence research is to design agents that will eventually cooperate with humans within the real world. To this end, embodied learning is emerging as one of the most important efforts contributed by the machine learning community towards this goal. Recently developing sub-fields concern various aspects of such systems---visual reasoning, language representations, causal mechanisms, robustness to out-of-distribution inputs, to name only a few.
In particular, multimodal learning and language grounding are vital to achieving a strong understanding of the real world. Humans build internal representations via interacting with their environment, learning complex associations between visual, auditory and linguistic concepts. Since the world abounds with structure, graph-based encodings are also likely to be incorporated in reasoning and decision-making modules. Furthermore, these relational representations are rather symbolic in nature---providing advantages over other formats, such as raw pixels---and can encode various types of links (temporal, causal, spatial) which can be essential for understanding and acting in the real world.
This thesis presents three research works that study and develop likely aspects of future intelligent agents. The first contribution centers on vision-and-language learning, introducing a challenging embodied task that shifts the focus of an existing one to the visual reasoning problem. By extending popular visual question answering (VQA) paradigms, I also designed several models that were evaluated on the novel dataset. This produced initial performance estimates for environment understanding, through the lens of a more challenging VQA downstream task. The second work presents two ways of obtaining hierarchical representations of graph-structured data. These methods either scaled to much larger graphs than the ones processed by the best-performing method at the time, or incorporated theoretical properties via the use of topological data analysis algorithms. Both approaches competed with contemporary state-of-the-art graph classification methods, even outside social domains in the second case, where the inductive bias was PageRank-driven. Finally, the third contribution delves further into relational learning, presenting a probabilistic treatment of graph representations in complex settings such as few-shot, multi-task learning and scarce-labelled data regimes. By adding relational inductive biases to neural processes, the resulting framework can model an entire distribution of functions which generate datasets with structure. This yielded significant performance gains, especially in the aforementioned complex scenarios, with semantically-accurate uncertainty estimates that drastically improved over the neural process baseline. This type of framework may eventually contribute to developing lifelong-learning systems, due to its ability to adapt to novel tasks and distributions.
The benchmark, methods and frameworks that I have devised during my doctoral studies suggest important future directions for embodied and graph representation learning research. These areas have increasingly proved their relevance to designing intelligent and collaborative agents, which we may interact with in the near future. By addressing several challenges in this problem space, my contributions therefore take a few steps towards building machine learning systems to be deployed in real-life settings.DREAM CD
- …