76 research outputs found

    An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

    Full text link
    A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are node-unbalanced or weight-unbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm.Comment: To appear in Journal of Algorithm

    A Unified View of Evaluation Metrics for Structured Prediction

    Full text link
    We present a conceptual framework that unifies a variety of evaluation metrics for different structured prediction tasks (e.g. event and relation extraction, syntactic and semantic parsing). Our framework requires representing the outputs of these tasks as objects of certain data types, and derives metrics through matching of common substructures, possibly followed by normalization. We demonstrate how commonly used metrics for a number of tasks can be succinctly expressed by this framework, and show that new metrics can be naturally derived in a bottom-up way based on an output structure. We release a library that enables this derivation to create new metrics. Finally, we consider how specific characteristics of tasks motivate metric design decisions, and suggest possible modifications to existing metrics in line with those motivations.Comment: Accepted at EMNLP2023 Main Trac

    29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

    Get PDF

    Active Information Acquisition With Mobile Robots

    Get PDF
    The recent proliferation of sensors and robots has potential to transform fields as diverse as environmental monitoring, security and surveillance, localization and mapping, and structure inspection. One of the great technical challenges in these scenarios is to control the sensors and robots in order to extract accurate information about various physical phenomena autonomously. The goal of this dissertation is to provide a unified approach for active information acquisition with a team of sensing robots. We formulate a decision problem for maximizing relevant information measures, constrained by the motion capabilities and sensing modalities of the robots, and focus on the design of a scalable control strategy for the robot team. The first part of the dissertation studies the active information acquisition problem in the special case of linear Gaussian sensing and mobility models. We show that the classical principle of separation between estimation and control holds in this case. It enables us to reduce the original stochastic optimal control problem to a deterministic version and to provide an optimal centralized solution. Unfortunately, the complexity of obtaining the optimal solution scales exponentially with the length of the planning horizon and the number of robots. We develop approximation algorithms to manage the complexity in both of these factors and provide theoretical performance guarantees. Applications in gas concentration mapping, joint localization and vehicle tracking in sensor networks, and active multi-robot localization and mapping are presented. Coupled with linearization and model predictive control, our algorithms can even generate adaptive control policies for nonlinear sensing and mobility models. Linear Gaussian information seeking, however, cannot be applied directly in the presence of sensing nuisances such as missed detections, false alarms, and ambiguous data association or when some sensor observations are discrete (e.g., object classes, medical alarms) or, even worse, when the sensing and target models are entirely unknown. The second part of the dissertation considers these complications in the context of two applications: active localization from semantic observations (e.g, recognized objects) and radio signal source seeking. The complexity of the target inference problem forces us to resort to greedy planning of the sensor trajectories. Non-greedy closed-loop information acquisition with general discrete models is achieved in the final part of the dissertation via dynamic programming and Monte Carlo tree search algorithms. Applications in active object recognition and pose estimation are presented. The techniques developed in this thesis offer an effective and scalable approach for controlled information acquisition with multiple sensing robots and have broad applications to environmental monitoring, search and rescue, security and surveillance, localization and mapping, precision agriculture, and structure inspection

    Multimodal Character Representation for Visual Story Understanding

    Full text link
    Stories are one of the main tools that humans use to make sense of the world around them. This ability is conjectured to be uniquely human, and concepts of agency and interaction have been found to develop during childhood. However, state-of-the-art artificial intelligence models still find it very challenging to represent or understand such information about the world. Over the past few years, there has been a lot of research into building systems that can understand the contents of images, videos, and text. Despite several advances made, computers still struggle to understand high-level discourse structures or how visuals and language are organized to tell a coherent story. Recently, several efforts have been made towards building story understanding benchmarks. As characters are the key component around which the story events unfold, character representations are crucial for deep story understanding such as their names, appearances, and relations to other characters. As a step towards endowing systems with a richer understanding of characters in a given narrative, this thesis develops new techniques that rely on the vision, audio and language channels to address three important challenges: i) speaker recognition and identification, ii) character representation and embedding, and iii) temporal modeling of character relations. We propose a multi-modal unsupervised model for speaker naming in movies, a novel way to represent movie character names in dialogues, and a multi-modal supervised character relation classification model. We also show that our approach improves systems ability to understand narratives, which is measured using several tasks such as their ability to answer questions about stories on several benchmarks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153444/1/mazab_1.pd

    Multipartite Graph Algorithms for the Analysis of Heterogeneous Data

    Get PDF
    The explosive growth in the rate of data generation in recent years threatens to outpace the growth in computer power, motivating the need for new, scalable algorithms and big data analytic techniques. No field may be more emblematic of this data deluge than the life sciences, where technologies such as high-throughput mRNA arrays and next generation genome sequencing are routinely used to generate datasets of extreme scale. Data from experiments in genomics, transcriptomics, metabolomics and proteomics are continuously being added to existing repositories. A goal of exploratory analysis of such omics data is to illuminate the functions and relationships of biomolecules within an organism. This dissertation describes the design, implementation and application of graph algorithms, with the goal of seeking dense structure in data derived from omics experiments in order to detect latent associations between often heterogeneous entities, such as genes, diseases and phenotypes. Exact combinatorial solutions are developed and implemented, rather than relying on approximations or heuristics, even when problems are exceedingly large and/or difficult. Datasets on which the algorithms are applied include time series transcriptomic data from an experiment on the developing mouse cerebellum, gene expression data measuring acute ethanol response in the prefrontal cortex, and the analysis of a predicted protein-protein interaction network. A bipartite graph model is used to integrate heterogeneous data types, such as genes with phenotypes and microbes with mouse strains. The techniques are then extended to a multipartite algorithm to enumerate dense substructure in multipartite graphs, constructed using data from three or more heterogeneous sources, with applications to functional genomics. Several new theoretical results are given regarding multipartite graphs and the multipartite enumeration algorithm. In all cases, practical implementations are demonstrated to expand the frontier of computational feasibility
    • …
    corecore