73 research outputs found

    Declarative Cleaning, Analysis, and Querying of Graph-structured Data

    Get PDF
    Much of today's data including social, biological, sensor, computer, and transportation network data is naturally modeled and represented by graphs. Typically, data describing these networks is observational, and thus noisy and incomplete. Therefore, methods for efficiently managing graph-structured data of this nature are needed, especially with the abundance and increasing sizes of such data. In my dissertation, I develop declarative methods to perform cleaning, analysis and querying of graph-structured data efficiently. For declarative cleaning of graph-structured data, I identify a set of primitives to support the extraction and inference of the underlying true network from observational data, and describe a framework that enables a network analyst to easily implement and combine new extraction and cleaning techniques. The task specification language is based on Datalog with a set of extensions designed to enable different graph cleaning primitives. For declarative analysis, I introduce 'ego-centric pattern census queries', a new type of graph analysis query that supports searching for structural patterns in every node's neighborhood and reporting their counts for further analysis. I define an SQL-based declarative language to support this class of queries, and develop a series of efficient query evaluation algorithms for it. Finally, I present an approach for querying large uncertain graphs that supports reasoning about uncertainty of node attributes, uncertainty of edge existence, and a new type of uncertainty, called identity linkage uncertainty, where a group of nodes can potentially refer to the same real-world entity. I define a probabilistic graph model to capture all these types of uncertainties, and to resolve identity linkage merges. I propose 'context-aware path indexing' and 'join-candidate reduction' methods to efficiently enable subgraph matching queries over large uncertain graphs of this type

    Time-dependent routing : models, algorithms, and the value of information

    Get PDF
    Le problème de tournées de véhicules (Vehicle routing problem - VRP), introduit il y a plus de 60 ans, demeure au cœur des systèmes de transport. Après des décennies de développement, le VRP, par son ensemble très riche de variantes, représente l'un des problèmes les plus étudiés dans la littérature. Pourtant, en raison du manque de données, deux hypothèses importantes font que le VRP ne s'adapte pas efficacement au trafic et à la congestion, deux éléments importants pour modéliser de façon réelle des problèmes pratiques. Une première hypothèse considère que la vitesse de déplacement est constante dans le temps. La seconde, considère que chaque paire de nœuds (clients) n'est reliée que par un arc, ignorant le réseau routier implicite (sous-jacent). La congestion de la circulation est l'un des plus grands défis des systèmes de transport. Ces systèmes étant directement affectés par la congestion, l'ensemble de la chaîne d'approvisionnement doit s'adapter à ce facteur, ce qui n'est pas simple. La croissance continue du fret au cours des dernières années aggrave encore la situation et une attention renouvelée à la mobilité, à l'environnement et à la logistique urbaine a mis en lumière ces questions. Récemment, les avancées technologiques en communication et en acquisition de données en temps réel ont permis de collecter plusieurs informations sur les véhicules telles que leur localisation, leur accélération, leur vitesse, leur décélération, etc. Ainsi, nous pouvons remettre en question la façon dont nous définissons, modélisons et résolvons les problèmes de transport. Ceci nous permet de surmonter les deux hypothèses mentionnées en intégrant non seulement les informations relatives à la congestion, mais aussi en considérant l'ensemble du réseau routier. Dans cette thèse nous considérons l'ensemble du réseau routier sous-jacent, ce qui signifie que nous avons les nœuds clients mais également tous les nœuds intermédiaires qui constituent ce réseau. Ensuite, nous modélisons le temps de trajet de chaque route individuellement au cours de la journée. En divisant une journée en petits intervalles, jusqu'à une précision de l'ordre de la seconde, nous prenons en considération des informations précises sur le trafic. Il en résulte un nouveau problème appelé le problème de tournées de véhicules à plus court chemin avec dépendance du temps (Time-dependant shortest path vehicle routing problem - TD-SPVRP), dans lequel nous combinons le problème du plus court chemin avec dépendance du temps et le VRP avec dépendance du temps, créant ainsi un problème plus général et très complexe. Le TD-SPVRP est plus proche des conditions réelles et il constitue le sujet du chapitre 2 où nous le formulons comme un modèle de programmation linéaire en nombres entiers mixtes et concevons une heuristique rapide et efficace pour le résoudre. Nous testons le modèle ainsi que l'heuristique sur des instances générées à partir de données réelles de circulation sur le réseau routier de la ville de Québec, Canada. Les résultats montrent que l'heuristique fournit des solutions de haute qualité avec un écart moyen de 5,66% par rapport aux bornes inférieures déterminées par le modèle. Cependant, le modèle mathématique ne parvient pas à trouver aucune solution pour les instances de données réelles. Pour pouvoir résoudre ce problème complexe, une grande attention a été portée à la performance de l'implantation des algorithmes proposés afin d'améliorer leur rapidité en termes de temps d'exécution. Le problème reste très compliqué, surtout lorsque nous considérons une grande partie du réseau routier sous-jacent avec des données de trafic très précises. Pour cela, nous avons utilisé différentes techniques pour optimiser l'effort de calcul afin de résoudre le problème en évaluant l'impact engendré sur la précision tout en évitant la perte de précieuses informations. Nous avons développé deux types d'agrégation de données couvrant deux niveaux d'information différents. Premièrement, nous avons manipulé la structure du réseau en réduisant sa taille, et deuxièmement en contrôlant le niveau d'agrégation temporel pour générer les données de trafic et pour déterminer la vitesse d'un véhicule à tout moment. Pour la structure du réseau, nous avons utilisé différentes techniques de réduction de graphe pour en réduire la taille. Nous avons étudié la valeur et le compromis de l'information spatiale. Les solutions générées en utilisant le graphe réduit sont analysées dans le Chapitre 3 pour évaluer la qualité et la perte d'information dû à la réduction. Cette analyse démontre également que la transformation classique du TD-SPVRP en un problème de tournées dépendant du temps (Time-dependant VRP - TD-VRP) équivalent résulte en un graphe plus grand qui nécessite un temps de traitement important ce qui a un impact sur la qualité de la solution. Notre développement montre que la résolution du TD-SPVRP nécessite en moyenne 1445 secondes tandis que la résolution du TD-VRP associé nécessite 41 181 secondes. Garder un haut niveau de précision et réussir à réduire la taille du graphe est possible. En particulier, deux procédures de réduction ont été développées, la réduction des nœuds et la réduction des arcs parallèles. Les deux techniques réduisent la taille du graphe. La réduction des nœuds conduit à une amélioration de 1,11%, la réduction des arcs parallèles donne un écart de 2,57% signifiant la présence d'une distorsion dans le graphe réduit. En ce qui concerne les informations sur le trafic, nous avons analysé les compromis entre une grande quantité de données très précises et un plus petit volume de données agrégées avec une perte potentielle d'information. Ceci est fait en analysant la précision des données agrégées sous différents modèles de détermination des temps de parcours. Ces approches sont présentées dans le Chapitre 4. Au niveau de la prévision des temps de parcours, il est important que chaque segment routier ait des observations de vitesse pour chaque intervalle de temps considéré, ce que nous appelons le niveau de couverture du réseau. Notre analyse indique qu'une couverture complète du réseau routier à tout moment de la journée est nécessaire pour atteindre un niveau de précision élevé. Le recours à une agrégation élevée (de grands intervalles de temps) permet de réduire la taille du problème et d'obtenir une meilleure couverture des données, mais au prix d'une perte d'information. Les modèles analysés, LTM (link travel mode) et FSM (flow speed model), partagent les mêmes performances lorsqu'on utilise un grand intervalle de temps (120, 300 et 600 secondes), donc un niveau d'agrégation plus élevé, avec un écart moyen absolu de 5,5% par rapport aux temps de parcours observés. Cependant, avec une courte période (1, 10, 30 et 60 secondes), FSM fonctionne mieux que LTM. Pour un intervalle d'une seconde, FSM donne un écart absolu moyen de 6,70%, tandis que LTM fournit un écart de 11,17%. Ce chapitre détermine ainsi sous quelles conditions les modèles d'estimation de temps de parcours fonctionnent bien et procurent des estimations fidèles des temps de parcours réalisés. Cette thèse est structurée de la manière suivante. À la suite d'une introduction générale dans laquelle nous présentons le cadre conceptuel de la thèse et son organisation, le Chapitre 1 présente une revue de la littérature pour les deux problèmes fondamentaux étudiés, le problème de plus court chemin (Shortest path problem - SPP) et le VRP et leurs variantes développées au cours des années. Le Chapitre 2 introduit une nouvelle variante du VRP, le TD-SPVRP. Le Chapitre 3 présente les différentes techniques développées pour réduire la taille du réseau en manipulant les informations spatiales du réseau routier. L'impact de ces réductions est évalué et analysé sur des instances réelles en utilisant plusieurs heuristiques. Le Chapitre 4 traite l'impact de l'agrégation des données temporelle et des modèles d'évaluation des temps de parcours. Le dernier chapitre constitue une conclusion et ouvre des perspectives de recherche relatives à nos travaux.The vehicle routing problem (VRP), introduced more than 60 years ago, is at the core of transportation systems. With decades of development, the VRP is one of the most studied problems in the literature, with a very rich set of variants. Yet, primarily due to the lack of data, two critical assumptions make the VRP fail to adapt effectively to traffic and congestion. The first assumption considers that the travel speed is constant over time ; the second, that each pair of customers is connected by an arc, ignoring the underlying street network. Traffic congestion is one of the biggest challenges in transportation systems. As traffic directly affects transportation activities, the whole supply chain needs to adjust to this factor. The continuous growth of freight in recent years worsens the situation, and a renewed focus on mobility, environment, and city logistics has shed light on these issues. Recently, advances in communications and real-time data acquisition technologies have made it possible to collect vehicle data such as their location, acceleration, driving speed, deceleration, etc. With the availability of this data, one can question the way we define, model, and solve transportation problems. This allows us to overcome the two issues indicated before and integrate congestion information and the whole underlying street network. We start by considering the whole underlying street network, which means we have customer nodes and intermediate nodes that constitute the street network. Then, we model the travel time of each street during the day. By dividing the day into small intervals, up to a precision of a second, we consider precise traffic information. This results in a new problem called the time-dependent shortest path vehicle routing problem (TD-SPVRP), in which we combine the time-dependent shortest path problem (TD-SPP) and the time-dependent VRP (TD-VRP), creating a more general and very challenging problem. The TD-SPVRP is closer to what can be found in real-world conditions, and it constitutes the topic of Chapter 2, where we formulate it as a mixed-integer linear programming model and design a fast and efficient heuristic algorithm to solve this problem. We test it on instances generated from actual traffic data from the road network in Québec City, Canada. Results show that the heuristic provides high-quality solutions with an average gap of only 5.66%, while the mathematical model fails to find a solution for any real instance. To solve the challenging problem, we emphasize the importance of a high-performance implementation to improve the speed and the execution time of the algorithms. Still, the problem is huge especially when we work on a large area of the underlying street network alongside very precise traffic data. To this end, we use different techniques to optimize the computational effort to solve the problem while assessing the impact on the precision to avoid the loss of valuable information. Two types of data aggregation are developed, covering two different levels of information. First, we manipulated the structure of the network by reducing its size, and second by controlling the time aggregation level to generate the traffic data, thus the data used to determine the speed of a vehicle at any time. For the network structure, we used different reduction techniques of the road graph to reduce its size. We studied the value and the trade-off of spatial information. Solutions generated using the reduced graph are analyzed in Chapter 3 to evaluate the quality and the loss of information from the reduction. We show that the transformation of the TD-SPVRP into an equivalent TD-VRP results in a large graph that requires significant preprocessing time, which impacts the solution quality. Our development shows that solving the TD-SPVRP is about 40 times faster than solving the related TD-VRP. Keeping a high level of precision and successfully reducing the size of the graph is possible. In particular, we develop two reduction procedures, node reduction and parallel arc reduction. Both techniques reduce the size of the graph, with different results. While the node reduction leads to improved reduction in the gap of 1.11%, the parallel arc reduction gives a gap of 2.57% indicating a distortion in the reduced graph. We analyzed the compromises regarding the traffic information, between a massive amount of very precise data or a smaller volume of aggregated data with some potential information loss. This is done while analyzing the precision of the aggregated data under different travel time models, and these developments appear in Chapter 4. Our analysis indicates that a full coverage of the street network at any time of the day is required to achieve a high level of coverage. Using high aggregation will result in a smaller problem with better data coverage but at the cost of a loss of information. We analyzed two travel time estimation models, the link travel model (LTM) and the flow speed model (FSM). They both shared the same performance when working with large intervals of time (120, 300, and 600 seconds), thus a higher level of aggregation, with an absolute average gap of 5.5% to the observed route travel time. With short periods (1, 10, 30, and 60 seconds), FSM performs better than LTM. For 1 second interval, FSM gives an average absolute gap of 6.70%, while LTM provides a gap of 11.17%. This thesis is structured as follows. After a general introduction in which we present the conceptual framework of the thesis and its organization, Chapter 1 presents the literature review for the two main problems of our development, the shortest path problem (SPP) and the VRP, and their time-dependent variants developed over the years. Chapter 2 introduces a new VRP variant, the TD-SPVRP. Chapter 3 presents the different techniques developed to reduce the size of the network by manipulating spatial information of the road network. The impact of these reductions is evaluated and analyzed on real data instances using multiple heuristics. Chapter 4 covers the impact of time aggregation data and travel time models when computing travel times on the precision of their estimations against observed travel times. The conclusion follows in the last chapter and presents some research perspectives for our works

    Transport systems analysis : models and data

    Get PDF
    Funding: This research project has been funded by Spanish R+D Programs, specifcally under Grant PID2020-112967GB-C31.Rapid advancements in new technologies, especially information and communication technologies (ICT), have significantly increased the number of sensors that capture data, namely those embedded in mobile devices. This wealth of data has garnered particular interest in analyzing transport systems, with some researchers arguing that the data alone are sufficient enough to render transport models unnecessary. However, this paper takes a contrary position and holds that models and data are not mutually exclusive but rather depend upon each other. Transport models are built upon established families of optimization and simulation approaches, and their development aligns with the scientific principles of operations research, which involves acquiring knowledge to derive modeling hypotheses. We provide an overview of these modeling principles and their application to transport systems, presenting numerous models that vary according to study objectives and corresponding modeling hypotheses. The data required for building, calibrating, and validating selected models are discussed, along with examples of using data analytics techniques to collect and handle the data supplied by ICT applications. The paper concludes with some comments on current and future trends

    Robot Motion Planning Under Topological Constraints

    Get PDF
    My thesis addresses the the problem of manipulation using multiple robots with cables. I study how robots with cables can tow objects in the plane, on the ground and on water, and how they can carry suspended payloads in the air. Specifically, I focus on planning optimal trajectories for robots. Path planning or trajectory generation for robotic systems is an active area of research in robotics. Many algorithms have been developed to generate path or trajectory for different robotic systems. One can classify planning algorithms into two broad categories. The first one is graph-search based motion planning over discretized configuration spaces. These algorithms are complete and quite efficient for finding optimal paths in cluttered 2-D and 3-D environments and are widely used [48]. The other class of algorithms are optimal control based methods. In most cases, the optimal control problem to generate optimal trajectories can be framed as a nonlinear and non convex optimization problem which is hard to solve. Recent work has attempted to overcome these shortcomings [68]. Advances in computational power and more sophisticated optimization algorithms have allowed us to solve more complex problems faster. However, our main interest is incorporating topological constraints. Topological constraints naturally arise when cables are used to wrap around objects. They are also important when robots have to move one way around the obstacles rather than the other way around. Thus I consider the optimal trajectory generation problem under topological constraints, and pursue problems that can be solved in finite-time, guaranteeing global optimal solutions. In my thesis, I first consider the problem of planning optimal trajectories around obstacles using optimal control methodologies. I then present the mathematical framework and algorithms for multi-robot topological exploration of unknown environments in which the main goal is to identify the different topological classes of paths. Finally, I address the manipulation and transportation of multiple objects with cables. Here I consider teams of two or three ground robots towing objects on the ground, two or three aerial robots carrying a suspended payload, and two boats towing a boom with applications to oil skimming and clean up. In all these problems, it is important to consider the topological constraints on the cable configurations as well as those on the paths of robot. I present solutions to the trajectory generation problem for all of these problems

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Component-based synthesis of motion planning algorithms

    Get PDF
    Combinatory Logic Synthesis generates data or runnable programs according to formal type specifications. Synthesis results are composed based on a user-specified repository of components, which brings several advantages for representing spaces of high variability. This work suggests strategies to manage the resulting variations by proposing a domain-specific brute-force search and a machine learning-based optimization procedure. The brute-force search involves the iterative generation and evaluation of machining strategies. In contrast, machine learning optimization uses statistical models to enable the exploration of the design space. The approaches involve synthesizing programs and meta-programs that manipulate, run, and evaluate programs. The methodologies are applied to the domain of motion planning algorithms, and they include the configuration of programs belonging to different algorithmic families. The study of the domain led to the identification of variability points and possible variations. Proof-of-concept repositories represent these variability points and incorporate them into their semantic structure. The selected algorithmic families involve specific computation steps or data structures, and corresponding software components represent possible variations. Experimental results demonstrate that CLS enables synthesis-driven domain-specific optimization procedures to solve complex problems by exploring spaces of high variability.Combinatory Logic Synthesis (CLS) generiert Daten oder lauffähige Programme anhand von formalen Typspezifikationen. Die Ergebnisse der Synthese werden auf Basis eines benutzerdefinierten Repositories von Komponenten zusammengestellt, was diverse Vorteile für die Beschreibung von Räumen mit hoher Variabilität mit sich bringt. Diese Arbeit stellt Strategien für den Umgang mit den resultierenden Variationen vor, indem eine domänen-spezifische Brute-Force Suche und ein maschinelles Lernverfahren für die Untersuchung eines Optimierungsproblems aufgezeigt werden. Die Brute-Force Suche besteht aus der iterativen Generierung und Evaluation von Frässtrategien. Im Gegensatz dazu nutzt der Optimierungsansatz statistische Modelle zur Erkundung des Entwurfsraums. Beide Ansätze synthetisieren Programme und Metaprogramme, welche Programme bearbeiten, ausführen und evaluieren. Diese Methoden werden auf die Domäne der Bewegungsplanungsalgorithmen angewendet und sie beinhalten die Konfiguration von Programmen, welche zu unterschiedlichen algorithmischen Familien gehören. Die Untersuchung der Domäne führte zur Identifizierung der Variabilitätspunkte und der möglichen Variationen. Entsprechende Proof of Concept Implementierungen in Form von Repositories repräsentieren jene Variabilitätspunkte und beziehen diese in ihre semantische Struktur ein. Die gewählten algorithmischen Familien sehen bestimmte Berechnungsschritte oder Datenstrukturen vor, und entsprechende Software Komponenten stellen mögliche Variationen dar. Versuchsergebnisse belegen, dass CLS synthese-getriebene domänenspezifische Optimierungsverfahren ermöglicht, welche komplexe Probleme durch die Exploration von Räumen hoher Variabilität lösen

    Social Network Extraction and Exploration of Historic Correspondences

    Get PDF
    Historic correspondences, in the form of letters, provide a scenario in which historic figures and events are reflected and thus play a ubiquitous role in the study of history. Confronted with the digitization of thousands of historic letters and motivated by the potentially valuable insights into history and intuitive quantitative relations between historic persons, researchers have recently focused on the network analysis of historic correspondences. However, most related research constructs the correspondence networks only based on the sender-recipient relation with the objective of visualization. Very few of them have proceeded beyond the above stage to exploit the detailed modeling of correspondence networks, let alone to develop novel concepts and algorithms derived from network analysis or formal approaches to the data uncertainty issue in historic correspondence. In the context of this dissertation, we develop a comprehensive correspondence network model, which integrates the personal, temporal, geographical, and topic information extracted from letter metadata and letter content into a hypergraph structure. Based on our correspondence network model, we analyze three types of person-person relations (sender-recipient, co-sender, and co-recipient) and two types of person-topic relations (author-topic and sender-recipient-topic) statically and dynamically. We develop multiple measurements, such as local and global reciprocity for quantifying reciprocal behavior in weighted networks, and the topic participation score for quantifying interests or the focus of individuals or real-life communities. We investigate the rising and the fading trends of topics in order to find correlations among persons, topics, and historic events. Furthermore, we develop a novel probabilistic framework for refinement of uncertain person names, geographical location names, and temporal expressions in the metadata of historic letters. We conduct extensive experiments using letter collections to validate and evaluate the proposed models and measurements in this dissertation. A thorough discussion of experimental results shows the effectiveness, applicability and advantages of our developed models and approaches

    Abstracts for the twentyfirst European workshop on Computational geometry, Technische Universiteit Eindhoven, The Netherlands, March 9-11, 2005

    Get PDF
    This volume contains abstracts of the papers presented at the 21st European Workshop on Computational Geometry, held at TU Eindhoven (the Netherlands) on March 9–11, 2005. There were 53 papers presented at the Workshop, covering a wide range of topics. This record number shows that the field of computational geometry is very much alive in Europe. We wish to thank all the authors who submitted papers and presented their work at the workshop. We believe that this has lead to a collection of very interesting abstracts that are both enjoyable and informative for the reader. Finally, we are grateful to TU Eindhoven for their support in organizing the workshop and to the Netherlands Organisation for Scientific Research (NWO) for sponsoring the workshop

    Discrete Path Planing Strategies for Coverage and Multi-Robot Rendezvous

    Get PDF
    This thesis addresses the problem of motion planning for autonomous robots, given a map and an estimate of the robot pose within it. The motion planning problem for a mobile robot can be defined as computing a trajectory in an environment from one pose to another while avoiding obstacles and optimizing some objective such as path length or travel time, subject to constraints like vehicle dynamics limitations. More complex planning problems such as multi-robot planning or complete coverage of an area can also be defined within a similar optimization structure. The computational complexity of path planning presents a considerable challenge for real-time execution with limited resources and various methods of simplifying the problem formulation by discretizing the solution space are grouped under the class of discrete planning methods. The approach suggests representing the environment as a roadmap graph and formulating shortest path problems to compute optimal robot trajectories on it. This thesis presents two main contributions under the framework of discrete planning. The first contribution addresses complete coverage of an unknown environment by a single omnidirectional ground rover. The 2D occupancy grid map of the environment is first converted into a polygonal representation and decomposed into a set of convex sectors. Second, a coverage path is computed through the sectors using a hierarchical inter-sector and intra-sector optimization structure. It should be noted that both convex decomposition and optimal sector ordering are known NP-hard problems, which are solved using a greedy cut approximation algorithm and Travelling Salesman Problem (TSP) heuristics, respectively. The second contribution presents multi-robot path-planning strategies for recharging autonomous robots performing a persistent task. The work considers the case of surveillance missions performed by a team of Unmanned Aerial Vehicles (UAVs). The goal is to plan minimum cost paths for a separate team of dedicated charging robots such that they rendezvous with and recharge all the UAVs as needed. To this end, planar UAV trajectories are discretized into sets of charging locations and a partitioned directed acyclic graph subject to timing constraints is defined over them. Solutions consist of paths through the graph for each of the charging robots. The rendezvous planning problem for a single recharge cycle is formulated as a Mixed Integer Linear Program (MILP), and an algorithmic approach, using a transformation to the TSP, is presented as a scalable heuristic alternative to the MILP. The solution is then extended to longer planning horizons using both a receding horizon and an optimal fixed horizon strategy. Simulation results are presented for both contributions, which demonstrate solution quality and performance of the presented algorithms
    • …
    corecore