52 research outputs found
Modeling Users' Information Needs in a Document Recommender for Meetings
People are surrounded by an unprecedented wealth of information. Access to it depends on the availability of suitable search engines, but even when these are available, people often do not initiate a search, because their current activity does not allow them, or they are not aware of the existence of this information. Just-in-time retrieval brings a radical change to the process of query-based retrieval, by proactively retrieving documents relevant to users' current activities, in an easily accessible and non-intrusive manner. This thesis presents a novel set of methods intended to improve the relevance of a just-in-time retrieval system, specifically a document recommender system designed for conversations, in terms of precision and diversity of results. Additionally, we designed an evaluation protocol to compare the proposed methods in the thesis with other ones using crowdsourcing. In contrast to previous systems, which model users' information needs by extracting keywords from clean and well-structured texts, this system models them from the conversation transcripts, which contain noise from automatic speech recognition (ASR) and have a free structure, often switching between several topics. To deal with these issues, we first propose a novel keyword extraction method which preserves both the relevance and the diversity of topics of the conversation, to properly capture possible users' needs with minimum ASR noise. Implicit queries are then built from these keywords. However, the presence of multiple unrelated topics in one query introduces significant noise into the retrieval results. To reduce this effect, we separate users' needs by topically clustering keyword sets into several subsets or implicit queries. We introduce a merging method which combines the results of multiple queries which are prepared from users' conversation to generate a concise, diverse and relevant list of documents. This method ensures that the system does not distract its users from their current conversation by frequently recommending them a large number of documents. Moreover, we address the problem of explicit queries that may be asked by users during a conversation. We introduce a query refinement method which leverages the conversation context to answer the users' information needs without asking for additional clarifications and therefore, again, avoiding to distract users during their conversation. Finally, we implemented the end-to-end document recommender system by integrating the ideas proposed in this thesis and then proposed an evaluation scenario with human users in a brainstorming meeting
A Framework for Exploiting Emergent Behaviour to capture 'Best Practice' within a Programming Domain
Inspection is a formalised process for reviewing an artefact in software engineering.
It is proven to significantly reduce defects, to ensure that what is delivered is what is
required, and that the finished product is effective and robust.
Peer code review is a less formal inspection of code, normally classified as
inadequate or substandard Inspection. Although it has an increased risk of not
locating defects, it has been shown to improve the knowledge and programming
skills of its participants.
This thesis examines the process of peer code review, comparing it to Inspection,
and attempts to describe how an informal code review can improve the knowledge
and skills of its participants by deploying an agent oriented approach.
During a review the participants discuss defects, recommendations and solutions, or
more generally their own experience. It is this instant adaptability to new
11
information that gives the review process the ability to improve knowledge. This
observed behaviour can be described as the emergent behaviour of the group of
programmers during the review.
The wider distribution of knowledge is currently only performed by programmers
attending other reviews. To maximise the benefits of peer code review, a
mechanism is needed by which the findings from one team can be captured and
propagated to other reviews / teams throughout an establishment.
A prototype multi-agent system is developed with the aim of capturing the emergent
properties of a team of programmers. As the interactions between the team members
is unstructured and the information traded is dynamic, a distributed adaptive system
is required to provide communication channels for the team and to provide a
foundation for the knowledge shared. Software agents are capable of adaptivity and
learning. Multi-agent systems are particularly effective at being deployed within
distributed architectures and are believed to be able to capture emergent behaviour.
The prototype system illustrates that the learning mechanism within the software
agents provides a solid foundation upon which the ability to detect defects can be
learnt. It also demonstrates that the multi-agent approach is apposite to provide the
free flow communication of ideas between programmers, not only to achieve the
sharing of defects and solutions but also at a high enough level to capture social
information. It is assumed that this social information is a measure of one element of
the review process's emergent behaviour.
The system is capable of monitoring the team-perceived abilities of programmers,
those who are influential on the programming style of others, and the issues upon
III
which programmers agree or disagree. If the disagreements are classified as
unimportant or stylistic issues, can it not therefore be assumed that all agreements
are concepts of "Best Practice"?
The conclusion is reached that code review is not a substandard Inspection but is in
fact complementary to the Inspection model, as the latter improves the process of
locating and identifying bugs while the former improves the knowledge and skill of
the programmers, and therefore the chance of bugs not being encoded to start with.
The prototype system demonstrates that it is possible to capture best practice from a
review team and that agents are well suited to the task. The performance criteria of
such a system have also been captured.
The prototype system has also shown that a reliable level of learning can be attained
for a real world task. The innovative way of concurrently deploying multiple agents
which use different approaches to achieve the same goal shows remarkable
robustness when learning from small example sets.
The novel way in which autonomy is promoted within the agents' design but
constrained within the agent community allows the system to provide a sufficiently
flexible communications structure to capture emergent social behaviour, whilst
ensuring that the agents remain committed to their own goals
Automatic extraction of concepts from texts and applications
The extraction of relevant terms from texts is an extensively researched task in Text-
Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness.
Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding
on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs.
For instance, a keyword extractor may assume that the most relevant keywords are the
most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology.
These factors led to the development of the ConceptExtractor, a statistical and
language-independent methodology which is explained in Part I of this thesis.
In Part II, I will show that the automatic extraction of concepts has great applicability.
For instance, for the extraction of keywords from documents, using the Tf-Idf metric
only on concepts yields better results than using Tf-Idf without concepts, specially for
multi-words. In addition, since concepts can be semantically related to other concepts,
this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.Fundação para a Ciência e a Tecnologia - SFRH/BD/61543/200
Interaction Tree Specifications: A Framework for Specifying Recursive, Effectful Computations That Supports Auto-Active Verification
This paper presents a specification framework for monadic, recursive, interactive programs that supports auto-active verification, an approach that combines user-provided guidance with automatic verification techniques. This verification tool is designed to have the flexibility of a manual approach to verification along with the usability benefits of automatic approaches. We accomplish this by augmenting Interaction Trees, a Coq datastructure for representing effectful computations, with logical quantifier events. We show that this yields a language of specifications that are easy to understand, automatable, and are powerful enough to handle properties that involve non-termination. Our framework is implemented as a library in Coq. We demonstrate the effectiveness of this framework by verifying real, low-level code
Conflict resolution algorithms for optimal trajectories in presence of uncertainty
Mención Internacional en el tÃtulo de doctorThe objective of the work presented in this Ph.D. thesis is to develop a novel
method to address the aircraft-obstacle avoidance problem in presence of uncertainty,
providing optimal trajectories in terms of risk of collision and time of flight. The
obstacle avoidance maneuver is the result of a Conflict Detection and Resolution
(CD&R) algorithm prepared for a potential conflict between an aircraft and a fixed
obstacle which position is uncertain.
Due to the growing interest in Unmanned Aerial System (UAS) operations,
CD&R topic has been intensively discussed and tackled in literature in the last 10
years. One of the crucial aspects that needs to be addressed for a safe and efficient
integration of UAS vehicles in non-segregated airspace is the CD&R activity. The
inherent nature of UAS, and the dynamic environment they are intended to work
in, put on the table of the challenges the capability of CD&R algorithms to handle
with scenarios in presence of uncertainty. Modeling uncertainty sources accurately,
and predicting future trajectories taking into account stochastic events, are rocky
issues in developing CD&R algorithms for optimal trajectories. Uncertainty about
the origin of threats, variable weather hazards, sensing and communication errors,
are only some of the possible uncertainty sources that make jeopardize air vehicle
operations.
In this work, conflict is defined as the violation of the minimum distance between
a vehicle and a fixed obstacle, and conflict avoidance maneuvers can be achieved
by only varying the aircraft heading angle. The CD&R problem, formulated as
Optimal Control Problem (OCP), is solved via indirect optimal control method.
Necessary conditions of optimality, namely, the Euler-Lagrange equations, obtained
from calculus of variations, are applied to the vehicle dynamics and the obstacle
constraint modeled as stochastic variable. The implicit equations of optimality lead
to formulate a Multipoint Boundary Value Problem (MPBVP) which solution is in general not trivial. The structure of the optimality trajectory is inferred from the type
of path constraint, and the trend of Lagrange multiplier is analyzed along the optimal
route. The MPBVP is firstly approximated by Taylor polynomials, and then solved
via Differential Algebra (DA) techniques.
The solution of the OCP is therefore a set of polynomials approximating the
optimal controls in presence of uncertainty, i.e., the optimal heading angles that
minimize the time of flight, while taking into account the uncertainty of the obstacle
position. Once the obstacle is detected by on-board sensors, this method provide a
useful tool that allows the pilot, or remote controller, to choose the best trade-off
between optimality and collision risk of the avoidance maneuver. Monte Carlo simulations
are run to validate the results and the effectiveness of the method presented.
The method is also valid to address CD&R problems in presence of storms, other
aircraft, or other types of hazards in the airspace characterized by constant relative
velocity with respect to the own aircraft.L’obiettivo del lavoro presentato in questa tesi di dottorato è la ricerca e lo sviluppo
di un nuovo metodo di anti collisione velivolo-ostacolo in presenza di incertezza,
fornendo traiettorie ottimali in termini di rischio di collisione e tempo di volo.
La manovra di anticollisione è il risultato di un algoritmo di detezione e risoluzione
dei conflitti, in inglese Conflict Detection and Resolution (CD&R), che risolve un
potenziale conflitto tra un velivolo e un ostacolo fisso la cui posizione è incerta.
A causa del crescente interesse nelle operazioni che coinvolgono velivoli autonomi,
anche definiti Unmanned Aerial System (UAS), negli ultimi 10 anni molte
ricerche e sviluppi sono state condotte nel campo degli algoritmi CD&R. Uno degli
aspetti cruciali per un’integrazione sicura ed efficiente dei velivoli UAS negli spazi
aerei non segregati è l’attività CD&R. La natura intrinseca degli UAS e l’ambiente
dinamico in cui sono destinati a lavorare, impongono delle numerose sfide fra cui
la capacità degli algoritmi CD&R di gestire scenari in presenza di incertezza. La
modellizzazione accurata delle fonti di incertezza e la previsione di traiettorie che
tengano conto di eventi stocastici, sono problemi particolarmente difficoltosi nello
sviluppo di algoritmi CD&R per traiettorie ottimali. L’incertezza sull’origine delle
minacce, zone di condizioni metereologiche avverse al volo, errori nei sensori e nei
sistemi di comunicazione per la navigazione aerea, sono solo alcune delle possibili
fonti di incertezza che mettono a repentaglio le operazioni degli aeromobili.
In questo lavoro, il conflitto è definito come la violazione della distanza minima
tra un veicolo e un ostacolo fisso, e le manovre per evitare i conflitti possono essere
ottenute solo variando l’angolo di rotta dell’aeromobile, ovvero virando. Il problema
CD&R, formulato come un problema di controllo ottimo, o Optimal Control Problem
(OCP), viene risolto tramite un metodo indiretto. Le condizioni necessarie di
ottimalità , vale a dire le equazioni di Eulero-Lagrange derivanti dal calcolo delle
variazioni, sono applicate alla dinamica del velivolo e all’ostacolo modellizato come una variabile stocastica. Le equazioni implicite di ottimalità formano un problema di
valori al controno multipunto, Multipoint Boundary Value Problem(MPBVP), la cui
soluzione in generale è tutt’altro che banale. La struttura della traiettoria ottimale
viene dedotta dal tipo di vincolo, e l’andamento del moltiplicatore di Lagrange viene
analizzato lungo il percorso ottimale. Il MPBVP viene prima approssimato con un
spazio di polinomi di Taylor e successimvamente risolto tramite tecniche di algebra
differenziale, in inglese Differential Algebra (DA).
La soluzione del OCP è quindi un insieme di polinomi che approssima il controllo
ottimo del problema in presenza di incertezza. In altri termini, il controllo ottimo è
l’insieme degli angoli di prua del velivolo che minimizzano il tempo di volo e che
tenendo conto dell’incertezza sulla posizione dell’ostacolo. Quando l’ostacolo viene
rilevato dai sensori di bordo, questo metodo fornisce un utile strumento al pilota,
o al controllore remoto, al fine di scegliere il miglior compromesso tra ottimalitÃ
e rischio di collisione con l’ostacolo. Simulazioni Monte Carlo sono eseguite per
convalidare i risultati e l’efficacia del metodo presentato. Il metodo è valido anche
per affrontare problemi CD&R in presenza di tempeste, altri velivoli, o altri tipi di
ostacoli caratterizzati da una velocità relativa costante rispetto al proprio velivolo.El objetivo del trabajo presentado en esta tesis doctoral es la búsqueda y el desarrollo
de un método novedoso de anticolisión con osbstáculos en espacios aéreos en
presencia de incertidumbre, proporcionando trayectorias óptimas en términos de
riesgo de colisión y tiempo de vuelo.
La maniobra de anticolisión es el resultado de un algoritmo de detección y
resolución de conflictos, en inglés Conflict Detection and Resolution (CD&R),
preparado para un conflicto potencial entre una aeronave y un obstáculo fijo cuya
posición es incierta.
Debido al creciente interés en las operaciones de vehÃculos autónomos, también
definidos como Unmanned Aerial System (UAS), en los últimos 10 años muchas
investigaciones se han llevado a cabo en el tema CD&R. Uno de los aspectos cruciales
que debe abordarse para una integración segura y eficiente de los vehÃculos UAS
en el espacio aéreo no segregado es la actividad CD&R. La naturaleza intrÃnseca
de UAS, y el entorno dinámico en el que están destinados a trabajar, suponen un
reto para la capacidad de los algoritmos de CD&R de trabajar con escenarios en
presencia de incertidumbre. La precisa modelización de las fuentes de incertidumbre,
y la predicción de trayectorias que tengan en cuenta los eventos estocásticos, son
problemas muy difÃciles en el desarrollo de algoritmos CD&R para trayectorias
óptimas. La incertidumbre sobre el origen de las amenazas, condiciones climáticas
adversas, errores en sensores y sistemas de comunicación para la navegación aérea,
son solo algunas de las posibles fuentes de incertidumbre que ponen en peligro las
operaciones de los vehÃculos aéreos.
En este trabajo, el conflicto se define como la violación de la distancia mÃnima
entre un vehÃculo y un obstáculo fijo, y las maniobras de anticolisión se pueden lograr
variando solo el ángulo de rumbo de la aeronave, es decir virando. El problema
CD&R, formulado como problema de control óptimo, o Optimal Control Problem (OCP), se resuelve a través del método de control óptimo indirecto. Las condiciones
necesarias de optimalidad, es decir, las ecuaciones de Euler-Lagrange que se obtienen
a partir del cálculo de variaciones, son aplicadas a la dinámica de la aeronave y
al obstáculo modelizado como variable estocástica. Las ecuaciones implÃcitas de
optimalidad forman un problema de valor de frontera multipunto (MPBVP) cuya
solución en general no es trivial. La estructura de la trayectoria de optimalidad se
deduce del tipo de vÃnculo, y la tendencia del multiplicador de Lagrange se analiza
a lo largo de la ruta óptima. El MPBVP se aproxima en primer lugar a través de
un espacio de polinomios de Taylor, y luego se resuelve por medio de técnicas de
álgebra diferencial, en inglés Differential Algebra(DA).
La solución del OCP es un conjunto de polinomios que aproximan los controles
óptimos en presencia de incertidumbre, es decir, los ángulos de rumbo óptimos que
minimizan el tiempo de vuelo teniendo en cuenta la incertidumbre asociada a la
posición del obstáculo. Una vez que los sensores a bordo detectan el obstáculo,
este método proporciona una herramienta muy útil que permite al piloto, o control
remoto, elegir el mejor compromiso entre optimalidad y riesgo de colisión con el
obstáculo. Se ejecutan simulaciones de Monte Carlo para validar los resultados y
la efectividad del método presentado. El método también es válido para abordar
los problemas de CD&R en presencia de tormentas, otras aeronaves u otros tipos
de obstáculos caracterizados por una velocidad relativa constante con respecto a la
propia aeronave.Programa de Doctorado en Mecánica de Fluidos por la Universidad Carlos III de Madrid; la Universidad de Jaén; la Universidad de Zaragoza; la Universidad Nacional de Educación a Distancia; la Universidad Politécnica de Madrid y la Universidad Rovira i VirgiliPresidente: Carlo Novara.- Secretario: Lucia Pallotino.- Vocales: Manuel Sanjurjo Rivo; Yoshinori Matsuno; Alfonso Valenzuela Romer
Semi-automated co-reference identification in digital humanities collections
Locating specific information within museum collections represents a significant challenge for collection users.
Even when the collections and catalogues exist in a searchable digital format, formatting differences and the imprecise nature of the information to be searched mean that information can be recorded in a large number of different ways. This variation exists not just between different collections, but also within individual ones. This means that traditional information retrieval techniques are badly suited to the challenges of locating particular information in digital humanities collections and searching, therefore, takes an excessive amount of time and resources.
This thesis focuses on a particular search problem, that of co-reference identification. This is the process of identifying when the same real world item is recorded in multiple digital locations. In this thesis, a real world example of a co-reference identification problem for digital humanities collections is identified and explored. In particular the time consuming nature of identifying co-referent records. In order to address the identified problem, this thesis presents a novel method for co-reference identification between digitised records in humanities collections. Whilst the specific focus of this thesis is co-reference identification, elements of the method described also have applications for general information retrieval.
The new co-reference method uses elements from a broad range of areas including; query expansion, co-reference identification, short text semantic similarity and fuzzy logic. The new method was tested against real world collections information, the results of which suggest that, in terms of the quality of the co-referent matches found, the new co-reference identification method is at least as effective as a manual search. The number of co-referent matches found however, is higher using the new method.
The approach presented here is capable of searching collections stored using differing metadata schemas. More significantly, the approach is capable of identifying potential co-reference matches despite the highly heterogeneous and syntax independent nature of the Gallery, Library Archive and Museum (GLAM) search space and the photo-history domain in particular. The most significant benefit of the new method is, however, that it requires comparatively little manual intervention. A co-reference search using it has, therefore, significantly lower person hour requirements than a manually conducted search.
In addition to the overall co-reference identification method, this thesis also presents:
• A novel and computationally lightweight short text semantic similarity metric. This new metric has a significantly higher throughput than the current prominent techniques but a negligible drop in accuracy.
• A novel method for comparing photographic processes in the presence of variable terminology and inaccurate field information. This is the first computational approach to do so.AHR
3D CNN methods in biomedical image segmentation
A definite trend in Biomedical Imaging is the one towards the integration of increasingly complex interpretative layers to the pure data acquisition process. One of the most interesting and looked-forward goals in the field is the automatic segmentation of objects of interest in extensive acquisition data, target that would allow Biomedical Imaging to look beyond its use as a purely assistive tool to become a cornerstone in ambitious large-scale challenges like the extensive quantitative study of the Human Brain.
In 2019 Convolutional Neural Networks represent the state of the art in Biomedical Image segmentation and scientific interests from a variety of fields, spacing from automotive to natural resource exploration, converge to their development. While most of the applications of CNNs are focused on single-image segmentation, biomedical image data -being it MRI, CT-scans, Microscopy, etc- often benefits from three-dimensional volumetric expression.
This work explores a reformulation of the CNN segmentation problem that is native to the 3D nature of the data, with particular interest to the applications to Fluorescence Microscopy volumetric data produced at the European Laboratories for Nonlinear Spectroscopy in the context of two different large international human brain study projects: the Human Brain Project and the White House BRAIN Initiative
Recommended from our members
Automated CAD Model Generation for Structural Optimisation
Computer-aided design (CAD) models play a crucial role in the design, manufacturing and maintenance of products. Therefore, the mesh-based finite element descriptions common in structural optimisation must be first translated into CAD models. Currently, this translation either can be performed semi-manually or fails to reserve the structural optimality found by the structural optimisation due to the intrinsic difference in geometric representation between finite element mesh and CAD model.
This thesis propose a fully automated and topologically accurate approach to synthesise structurally sound parametric CAD models from topology-optimised finite element models to fill the long-existing gap between structural optimisation and CAD systems. This approach successfully preserves the optimal structural performance during the mesh-CAD conversion.
The solution provided in this thesis is to first convert the topology-optimised structure into a spatial frame structure and then to regenerate it in a CAD system using standard constructive solid geometry (CSG) operations. The obtained parametric CAD models are compact, that is, have as few as possible geometric parameters, which makes them ideal for editing and further processing within a CAD system. The critical task of converting the topology-optimised structure into an optimal spatial frame structure is accomplished in several steps. The first step is to generate a one-voxel-wide voxel chain model from the topology-optimised voxel model using a topology-preserving skeletonisation algorithm from digital topology. The undirected graph defined by the voxel chain model yields a spatial frame structure after processing it with the proposed graph algorithms. Subsequently, the cross-sections and layout of the frame members are optimised to recover its optimality, which may have been compromised during the conversion process. At last, the obtained frame structure is generated in a CAD system by repeatedly combining primitive solids, like cylinders and spheres, using boolean operations. The resulting solid model is a boundary representation (B-Rep) consisting of trimmed non-uniform rational B-spline (NURBS) curves and surfaces.
The numerical studies in this thesis clarify that the converted spatial frame structures are with equivalent structural performance. Moreover, CAD models generated from the spatial frame structures have significantly fewer geometric degree of freedom compared to the topology-optimised structures. Though the numerical studies use topology-optimised structures as input and compact CSG models as output, this thesis also provides the way to extend the proposed generation process to taking other optimised meshes and producing outputs of various geometric representations. This offers a wide range of possible applications and brings new thoughts to industrial design and manufacturing.Chinese Scholarship Counci
- …