52 research outputs found

    Modeling Users' Information Needs in a Document Recommender for Meetings

    Get PDF
    People are surrounded by an unprecedented wealth of information. Access to it depends on the availability of suitable search engines, but even when these are available, people often do not initiate a search, because their current activity does not allow them, or they are not aware of the existence of this information. Just-in-time retrieval brings a radical change to the process of query-based retrieval, by proactively retrieving documents relevant to users' current activities, in an easily accessible and non-intrusive manner. This thesis presents a novel set of methods intended to improve the relevance of a just-in-time retrieval system, specifically a document recommender system designed for conversations, in terms of precision and diversity of results. Additionally, we designed an evaluation protocol to compare the proposed methods in the thesis with other ones using crowdsourcing. In contrast to previous systems, which model users' information needs by extracting keywords from clean and well-structured texts, this system models them from the conversation transcripts, which contain noise from automatic speech recognition (ASR) and have a free structure, often switching between several topics. To deal with these issues, we first propose a novel keyword extraction method which preserves both the relevance and the diversity of topics of the conversation, to properly capture possible users' needs with minimum ASR noise. Implicit queries are then built from these keywords. However, the presence of multiple unrelated topics in one query introduces significant noise into the retrieval results. To reduce this effect, we separate users' needs by topically clustering keyword sets into several subsets or implicit queries. We introduce a merging method which combines the results of multiple queries which are prepared from users' conversation to generate a concise, diverse and relevant list of documents. This method ensures that the system does not distract its users from their current conversation by frequently recommending them a large number of documents. Moreover, we address the problem of explicit queries that may be asked by users during a conversation. We introduce a query refinement method which leverages the conversation context to answer the users' information needs without asking for additional clarifications and therefore, again, avoiding to distract users during their conversation. Finally, we implemented the end-to-end document recommender system by integrating the ideas proposed in this thesis and then proposed an evaluation scenario with human users in a brainstorming meeting

    A Framework for Exploiting Emergent Behaviour to capture 'Best Practice' within a Programming Domain

    Get PDF
    Inspection is a formalised process for reviewing an artefact in software engineering. It is proven to significantly reduce defects, to ensure that what is delivered is what is required, and that the finished product is effective and robust. Peer code review is a less formal inspection of code, normally classified as inadequate or substandard Inspection. Although it has an increased risk of not locating defects, it has been shown to improve the knowledge and programming skills of its participants. This thesis examines the process of peer code review, comparing it to Inspection, and attempts to describe how an informal code review can improve the knowledge and skills of its participants by deploying an agent oriented approach. During a review the participants discuss defects, recommendations and solutions, or more generally their own experience. It is this instant adaptability to new 11 information that gives the review process the ability to improve knowledge. This observed behaviour can be described as the emergent behaviour of the group of programmers during the review. The wider distribution of knowledge is currently only performed by programmers attending other reviews. To maximise the benefits of peer code review, a mechanism is needed by which the findings from one team can be captured and propagated to other reviews / teams throughout an establishment. A prototype multi-agent system is developed with the aim of capturing the emergent properties of a team of programmers. As the interactions between the team members is unstructured and the information traded is dynamic, a distributed adaptive system is required to provide communication channels for the team and to provide a foundation for the knowledge shared. Software agents are capable of adaptivity and learning. Multi-agent systems are particularly effective at being deployed within distributed architectures and are believed to be able to capture emergent behaviour. The prototype system illustrates that the learning mechanism within the software agents provides a solid foundation upon which the ability to detect defects can be learnt. It also demonstrates that the multi-agent approach is apposite to provide the free flow communication of ideas between programmers, not only to achieve the sharing of defects and solutions but also at a high enough level to capture social information. It is assumed that this social information is a measure of one element of the review process's emergent behaviour. The system is capable of monitoring the team-perceived abilities of programmers, those who are influential on the programming style of others, and the issues upon III which programmers agree or disagree. If the disagreements are classified as unimportant or stylistic issues, can it not therefore be assumed that all agreements are concepts of "Best Practice"? The conclusion is reached that code review is not a substandard Inspection but is in fact complementary to the Inspection model, as the latter improves the process of locating and identifying bugs while the former improves the knowledge and skill of the programmers, and therefore the chance of bugs not being encoded to start with. The prototype system demonstrates that it is possible to capture best practice from a review team and that agents are well suited to the task. The performance criteria of such a system have also been captured. The prototype system has also shown that a reliable level of learning can be attained for a real world task. The innovative way of concurrently deploying multiple agents which use different approaches to achieve the same goal shows remarkable robustness when learning from small example sets. The novel way in which autonomy is promoted within the agents' design but constrained within the agent community allows the system to provide a sufficiently flexible communications structure to capture emergent social behaviour, whilst ensuring that the agents remain committed to their own goals

    Automatic extraction of concepts from texts and applications

    Get PDF
    The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.Fundação para a Ciência e a Tecnologia - SFRH/BD/61543/200

    Interaction Tree Specifications: A Framework for Specifying Recursive, Effectful Computations That Supports Auto-Active Verification

    Get PDF
    This paper presents a specification framework for monadic, recursive, interactive programs that supports auto-active verification, an approach that combines user-provided guidance with automatic verification techniques. This verification tool is designed to have the flexibility of a manual approach to verification along with the usability benefits of automatic approaches. We accomplish this by augmenting Interaction Trees, a Coq datastructure for representing effectful computations, with logical quantifier events. We show that this yields a language of specifications that are easy to understand, automatable, and are powerful enough to handle properties that involve non-termination. Our framework is implemented as a library in Coq. We demonstrate the effectiveness of this framework by verifying real, low-level code

    Conflict resolution algorithms for optimal trajectories in presence of uncertainty

    Get PDF
    Mención Internacional en el título de doctorThe objective of the work presented in this Ph.D. thesis is to develop a novel method to address the aircraft-obstacle avoidance problem in presence of uncertainty, providing optimal trajectories in terms of risk of collision and time of flight. The obstacle avoidance maneuver is the result of a Conflict Detection and Resolution (CD&R) algorithm prepared for a potential conflict between an aircraft and a fixed obstacle which position is uncertain. Due to the growing interest in Unmanned Aerial System (UAS) operations, CD&R topic has been intensively discussed and tackled in literature in the last 10 years. One of the crucial aspects that needs to be addressed for a safe and efficient integration of UAS vehicles in non-segregated airspace is the CD&R activity. The inherent nature of UAS, and the dynamic environment they are intended to work in, put on the table of the challenges the capability of CD&R algorithms to handle with scenarios in presence of uncertainty. Modeling uncertainty sources accurately, and predicting future trajectories taking into account stochastic events, are rocky issues in developing CD&R algorithms for optimal trajectories. Uncertainty about the origin of threats, variable weather hazards, sensing and communication errors, are only some of the possible uncertainty sources that make jeopardize air vehicle operations. In this work, conflict is defined as the violation of the minimum distance between a vehicle and a fixed obstacle, and conflict avoidance maneuvers can be achieved by only varying the aircraft heading angle. The CD&R problem, formulated as Optimal Control Problem (OCP), is solved via indirect optimal control method. Necessary conditions of optimality, namely, the Euler-Lagrange equations, obtained from calculus of variations, are applied to the vehicle dynamics and the obstacle constraint modeled as stochastic variable. The implicit equations of optimality lead to formulate a Multipoint Boundary Value Problem (MPBVP) which solution is in general not trivial. The structure of the optimality trajectory is inferred from the type of path constraint, and the trend of Lagrange multiplier is analyzed along the optimal route. The MPBVP is firstly approximated by Taylor polynomials, and then solved via Differential Algebra (DA) techniques. The solution of the OCP is therefore a set of polynomials approximating the optimal controls in presence of uncertainty, i.e., the optimal heading angles that minimize the time of flight, while taking into account the uncertainty of the obstacle position. Once the obstacle is detected by on-board sensors, this method provide a useful tool that allows the pilot, or remote controller, to choose the best trade-off between optimality and collision risk of the avoidance maneuver. Monte Carlo simulations are run to validate the results and the effectiveness of the method presented. The method is also valid to address CD&R problems in presence of storms, other aircraft, or other types of hazards in the airspace characterized by constant relative velocity with respect to the own aircraft.L’obiettivo del lavoro presentato in questa tesi di dottorato è la ricerca e lo sviluppo di un nuovo metodo di anti collisione velivolo-ostacolo in presenza di incertezza, fornendo traiettorie ottimali in termini di rischio di collisione e tempo di volo. La manovra di anticollisione è il risultato di un algoritmo di detezione e risoluzione dei conflitti, in inglese Conflict Detection and Resolution (CD&R), che risolve un potenziale conflitto tra un velivolo e un ostacolo fisso la cui posizione è incerta. A causa del crescente interesse nelle operazioni che coinvolgono velivoli autonomi, anche definiti Unmanned Aerial System (UAS), negli ultimi 10 anni molte ricerche e sviluppi sono state condotte nel campo degli algoritmi CD&R. Uno degli aspetti cruciali per un’integrazione sicura ed efficiente dei velivoli UAS negli spazi aerei non segregati è l’attività CD&R. La natura intrinseca degli UAS e l’ambiente dinamico in cui sono destinati a lavorare, impongono delle numerose sfide fra cui la capacità degli algoritmi CD&R di gestire scenari in presenza di incertezza. La modellizzazione accurata delle fonti di incertezza e la previsione di traiettorie che tengano conto di eventi stocastici, sono problemi particolarmente difficoltosi nello sviluppo di algoritmi CD&R per traiettorie ottimali. L’incertezza sull’origine delle minacce, zone di condizioni metereologiche avverse al volo, errori nei sensori e nei sistemi di comunicazione per la navigazione aerea, sono solo alcune delle possibili fonti di incertezza che mettono a repentaglio le operazioni degli aeromobili. In questo lavoro, il conflitto è definito come la violazione della distanza minima tra un veicolo e un ostacolo fisso, e le manovre per evitare i conflitti possono essere ottenute solo variando l’angolo di rotta dell’aeromobile, ovvero virando. Il problema CD&R, formulato come un problema di controllo ottimo, o Optimal Control Problem (OCP), viene risolto tramite un metodo indiretto. Le condizioni necessarie di ottimalità, vale a dire le equazioni di Eulero-Lagrange derivanti dal calcolo delle variazioni, sono applicate alla dinamica del velivolo e all’ostacolo modellizato come una variabile stocastica. Le equazioni implicite di ottimalità formano un problema di valori al controno multipunto, Multipoint Boundary Value Problem(MPBVP), la cui soluzione in generale è tutt’altro che banale. La struttura della traiettoria ottimale viene dedotta dal tipo di vincolo, e l’andamento del moltiplicatore di Lagrange viene analizzato lungo il percorso ottimale. Il MPBVP viene prima approssimato con un spazio di polinomi di Taylor e successimvamente risolto tramite tecniche di algebra differenziale, in inglese Differential Algebra (DA). La soluzione del OCP è quindi un insieme di polinomi che approssima il controllo ottimo del problema in presenza di incertezza. In altri termini, il controllo ottimo è l’insieme degli angoli di prua del velivolo che minimizzano il tempo di volo e che tenendo conto dell’incertezza sulla posizione dell’ostacolo. Quando l’ostacolo viene rilevato dai sensori di bordo, questo metodo fornisce un utile strumento al pilota, o al controllore remoto, al fine di scegliere il miglior compromesso tra ottimalità e rischio di collisione con l’ostacolo. Simulazioni Monte Carlo sono eseguite per convalidare i risultati e l’efficacia del metodo presentato. Il metodo è valido anche per affrontare problemi CD&R in presenza di tempeste, altri velivoli, o altri tipi di ostacoli caratterizzati da una velocità relativa costante rispetto al proprio velivolo.El objetivo del trabajo presentado en esta tesis doctoral es la búsqueda y el desarrollo de un método novedoso de anticolisión con osbstáculos en espacios aéreos en presencia de incertidumbre, proporcionando trayectorias óptimas en términos de riesgo de colisión y tiempo de vuelo. La maniobra de anticolisión es el resultado de un algoritmo de detección y resolución de conflictos, en inglés Conflict Detection and Resolution (CD&R), preparado para un conflicto potencial entre una aeronave y un obstáculo fijo cuya posición es incierta. Debido al creciente interés en las operaciones de vehículos autónomos, también definidos como Unmanned Aerial System (UAS), en los últimos 10 años muchas investigaciones se han llevado a cabo en el tema CD&R. Uno de los aspectos cruciales que debe abordarse para una integración segura y eficiente de los vehículos UAS en el espacio aéreo no segregado es la actividad CD&R. La naturaleza intrínseca de UAS, y el entorno dinámico en el que están destinados a trabajar, suponen un reto para la capacidad de los algoritmos de CD&R de trabajar con escenarios en presencia de incertidumbre. La precisa modelización de las fuentes de incertidumbre, y la predicción de trayectorias que tengan en cuenta los eventos estocásticos, son problemas muy difíciles en el desarrollo de algoritmos CD&R para trayectorias óptimas. La incertidumbre sobre el origen de las amenazas, condiciones climáticas adversas, errores en sensores y sistemas de comunicación para la navegación aérea, son solo algunas de las posibles fuentes de incertidumbre que ponen en peligro las operaciones de los vehículos aéreos. En este trabajo, el conflicto se define como la violación de la distancia mínima entre un vehículo y un obstáculo fijo, y las maniobras de anticolisión se pueden lograr variando solo el ángulo de rumbo de la aeronave, es decir virando. El problema CD&R, formulado como problema de control óptimo, o Optimal Control Problem (OCP), se resuelve a través del método de control óptimo indirecto. Las condiciones necesarias de optimalidad, es decir, las ecuaciones de Euler-Lagrange que se obtienen a partir del cálculo de variaciones, son aplicadas a la dinámica de la aeronave y al obstáculo modelizado como variable estocástica. Las ecuaciones implícitas de optimalidad forman un problema de valor de frontera multipunto (MPBVP) cuya solución en general no es trivial. La estructura de la trayectoria de optimalidad se deduce del tipo de vínculo, y la tendencia del multiplicador de Lagrange se analiza a lo largo de la ruta óptima. El MPBVP se aproxima en primer lugar a través de un espacio de polinomios de Taylor, y luego se resuelve por medio de técnicas de álgebra diferencial, en inglés Differential Algebra(DA). La solución del OCP es un conjunto de polinomios que aproximan los controles óptimos en presencia de incertidumbre, es decir, los ángulos de rumbo óptimos que minimizan el tiempo de vuelo teniendo en cuenta la incertidumbre asociada a la posición del obstáculo. Una vez que los sensores a bordo detectan el obstáculo, este método proporciona una herramienta muy útil que permite al piloto, o control remoto, elegir el mejor compromiso entre optimalidad y riesgo de colisión con el obstáculo. Se ejecutan simulaciones de Monte Carlo para validar los resultados y la efectividad del método presentado. El método también es válido para abordar los problemas de CD&R en presencia de tormentas, otras aeronaves u otros tipos de obstáculos caracterizados por una velocidad relativa constante con respecto a la propia aeronave.Programa de Doctorado en Mecánica de Fluidos por la Universidad Carlos III de Madrid; la Universidad de Jaén; la Universidad de Zaragoza; la Universidad Nacional de Educación a Distancia; la Universidad Politécnica de Madrid y la Universidad Rovira i VirgiliPresidente: Carlo Novara.- Secretario: Lucia Pallotino.- Vocales: Manuel Sanjurjo Rivo; Yoshinori Matsuno; Alfonso Valenzuela Romer

    Exploring annotations for deductive verification

    Get PDF

    Semi-automated co-reference identification in digital humanities collections

    Get PDF
    Locating specific information within museum collections represents a significant challenge for collection users. Even when the collections and catalogues exist in a searchable digital format, formatting differences and the imprecise nature of the information to be searched mean that information can be recorded in a large number of different ways. This variation exists not just between different collections, but also within individual ones. This means that traditional information retrieval techniques are badly suited to the challenges of locating particular information in digital humanities collections and searching, therefore, takes an excessive amount of time and resources. This thesis focuses on a particular search problem, that of co-reference identification. This is the process of identifying when the same real world item is recorded in multiple digital locations. In this thesis, a real world example of a co-reference identification problem for digital humanities collections is identified and explored. In particular the time consuming nature of identifying co-referent records. In order to address the identified problem, this thesis presents a novel method for co-reference identification between digitised records in humanities collections. Whilst the specific focus of this thesis is co-reference identification, elements of the method described also have applications for general information retrieval. The new co-reference method uses elements from a broad range of areas including; query expansion, co-reference identification, short text semantic similarity and fuzzy logic. The new method was tested against real world collections information, the results of which suggest that, in terms of the quality of the co-referent matches found, the new co-reference identification method is at least as effective as a manual search. The number of co-referent matches found however, is higher using the new method. The approach presented here is capable of searching collections stored using differing metadata schemas. More significantly, the approach is capable of identifying potential co-reference matches despite the highly heterogeneous and syntax independent nature of the Gallery, Library Archive and Museum (GLAM) search space and the photo-history domain in particular. The most significant benefit of the new method is, however, that it requires comparatively little manual intervention. A co-reference search using it has, therefore, significantly lower person hour requirements than a manually conducted search. In addition to the overall co-reference identification method, this thesis also presents: • A novel and computationally lightweight short text semantic similarity metric. This new metric has a significantly higher throughput than the current prominent techniques but a negligible drop in accuracy. • A novel method for comparing photographic processes in the presence of variable terminology and inaccurate field information. This is the first computational approach to do so.AHR

    3D CNN methods in biomedical image segmentation

    Get PDF
    A definite trend in Biomedical Imaging is the one towards the integration of increasingly complex interpretative layers to the pure data acquisition process. One of the most interesting and looked-forward goals in the field is the automatic segmentation of objects of interest in extensive acquisition data, target that would allow Biomedical Imaging to look beyond its use as a purely assistive tool to become a cornerstone in ambitious large-scale challenges like the extensive quantitative study of the Human Brain. In 2019 Convolutional Neural Networks represent the state of the art in Biomedical Image segmentation and scientific interests from a variety of fields, spacing from automotive to natural resource exploration, converge to their development. While most of the applications of CNNs are focused on single-image segmentation, biomedical image data -being it MRI, CT-scans, Microscopy, etc- often benefits from three-dimensional volumetric expression. This work explores a reformulation of the CNN segmentation problem that is native to the 3D nature of the data, with particular interest to the applications to Fluorescence Microscopy volumetric data produced at the European Laboratories for Nonlinear Spectroscopy in the context of two different large international human brain study projects: the Human Brain Project and the White House BRAIN Initiative
    • …
    corecore