317 research outputs found
Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it)
What if Information Retrieval (IR) systems did not just retrieve relevant
information that is stored in their indices, but could also "understand" it and
synthesise it into a single document? We present a preliminary study that makes
a first step towards answering this question. Given a query, we train a
Recurrent Neural Network (RNN) on existing relevant information to that query.
We then use the RNN to "deep learn" a single, synthetic, and we assume,
relevant document for that query. We design a crowdsourcing experiment to
assess how relevant the "deep learned" document is, compared to existing
relevant documents. Users are shown a query and four wordclouds (of three
existing relevant documents and our deep learned synthetic document). The
synthetic document is ranked on average most relevant of all.Comment: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, July 21,
2016, Pisa, Ital
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Co-simulation of Continuous Systems: A Tutorial
Co-simulation consists of the theory and techniques to enable global
simulation of a coupled system via the composition of simulators. Despite the
large number of applications and growing interest in the challenges, the field
remains fragmented into multiple application domains, with limited sharing of
knowledge.
This tutorial aims at introducing co-simulation of continuous systems,
targeted at researchers new to the field
On Using Toeplitz and Circulant Matrices for Johnson-Lindenstrauss Transforms
The Johnson-Lindenstrauss lemma is one of the corner stone results in
dimensionality reduction. It says that given , for any set of vectors , there exists a mapping such
that preserves all pairwise distances between vectors in to within
if . Much effort has gone
into developing fast embedding algorithms, with the Fast Johnson-Lindenstrauss
transform of Ailon and Chazelle being one of the most well-known techniques.
The current fastest algorithm that yields the optimal dimensions has an embedding time of . An exciting approach towards improving this, due to
Hinrichs and Vyb\'iral, is to use a random Toeplitz matrix for the
embedding. Using Fast Fourier Transform, the embedding of a vector can then be
computed in time. The big question is of course whether dimensions suffice for this technique. If so, this
would end a decades long quest to obtain faster and faster
Johnson-Lindenstrauss transforms. The current best analysis of the embedding of
Hinrichs and Vyb\'iral shows that dimensions
suffices. The main result of this paper, is a proof that this analysis
unfortunately cannot be tightened any further, i.e., there exists a set of
vectors requiring for the Toeplitz
approach to work
Atomic structure optimization with machine-learning enabled interpolation between chemical elements
We introduce a computational method for global optimization of structure and
ordering in atomic systems. The method relies on interpolation between chemical
elements, which is incorporated in a machine learning structural fingerprint.
The method is based on Bayesian optimization with Gaussian processes and is
applied to the global optimization of Au-Cu bulk systems, Cu-Ni surfaces with
CO adsorption, and Cu-Ni clusters. The method consistently identifies
low-energy structures, which are likely to be the global minima of the energy.
For the investigated systems with 23-66 atoms, the number of required energy
and force calculations is in the range 3-75
Prediction of the Free Energy of Binding for Cyclodextrin-Steroid Complexes:Phase Solubility and Molecular Dynamics Studies
Flight Planning in Free Route Airspaces
We consider the problem of finding cheapest flight routes through free route airspaces in a 2D setting. We subdivide the airspace into regions determined by a Voronoi subdivision around the points from a weather forecast. This gives rise to a regular grid of rectangular regions (quads) with every quad having an associated vector-weight that represents the wind magnitude and direction. Finding a cheapest path in this setting corresponds to finding a piece-wise linear path determined by points on the boundaries of the quads. In our solution approach, we discretize such boundaries by introducing border points and only consider segments connecting border points belonging to the same quad. While classic shortest path graph algorithms are available and applicable to the graphs originating from these border points, we design an algorithm that exploits the geometric structure of our scenario and show that this algorithm is more efficient in practice than classic graph-based algorithms. In particular, it scales better with the number of quads in the subdivision of the airspace, making it possible to find more accurate routes or to solve larger problems
- …