10 research outputs found

    Enhancing In-Memory Spatial Indexing with Learned Search

    Get PDF
    Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23Ă— to 2.47Ă—), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44Ă— to 53.34Ă— faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia

    Fast Mapping onto Census Blocks

    Full text link
    Pandemic measures such as social distancing and contact tracing can be enhanced by rapidly integrating dynamic location data and demographic data. Projecting billions of longitude and latitude locations onto hundreds of thousands of highly irregular demographic census block polygons is computationally challenging in both research and deployment contexts. This paper describes two approaches labeled "simple" and "fast". The simple approach can be implemented in any scripting language (Matlab/Octave, Python, Julia, R) and is easily integrated and customized to a variety of research goals. This simple approach uses a novel combination of hierarchy, sparse bounding boxes, polygon crossing-number, vectorization, and parallel processing to achieve 100,000,000+ projections per second on 100 servers. The simple approach is compact, does not increase data storage requirements, and is applicable to any country or region. The fast approach exploits the thread, vector, and memory optimizations that are possible using a low-level language (C++) and achieves similar performance on a single server. This paper details these approaches with the goal of enabling the broader community to quickly integrate location and demographic data.Comment: 8 pages, 7 figures, 55 references; accepted to IEEE HPEC 202

    Weiterentwicklung analytischer Datenbanksysteme

    Get PDF
    This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens

    Scalable and Quality-Aware Training Data Acquisition for Conversational Cognitive Services

    Full text link
    Dialog Systems (or simply bots) have recently become a popular human-computer interface for performing user's tasks, by invoking the appropriate back-end APIs (Application Programming Interfaces) based on the user's request in natural language. Building task-oriented bots, which aim at performing real-world tasks (e.g., booking flights), has become feasible with the continuous advances in Natural Language Processing (NLP), Artificial Intelligence (AI), and the countless number of devices which allow third-party software systems to invoke their back-end APIs. Nonetheless, bot development technologies are still in their preliminary stages, with several unsolved theoretical and technical challenges stemming from the ambiguous nature of human languages. Given the richness of natural language, supervised models require a large number of user utterances paired with their corresponding tasks -- called intents. To build a bot, developers need to manually translate APIs to utterances (called canonical utterances) and paraphrase them to obtain a diverse set of utterances. Crowdsourcing has been widely used to obtain such datasets, by paraphrasing the initial utterances generated by the bot developers for each task. However, there are several unsolved issues. First, generating canonical utterances requires manual efforts, making bot development both expensive and hard to scale. Second, since crowd workers may be anonymous and are asked to provide open-ended text (paraphrases), crowdsourced paraphrases may be noisy and incorrect (not conveying the same intent as the given task). This thesis first surveys the state-of-the-art approaches for collecting large training utterances for task-oriented bots. Next, we conduct an empirical study to identify quality issues of crowdsourced utterances (e.g., grammatical errors, semantic completeness). Moreover, we propose novel approaches for identifying unqualified crowd workers and eliminating malicious workers from crowdsourcing tasks. Particularly, we propose a novel technique to promote the diversity of crowdsourced paraphrases by dynamically generating word suggestions while crowd workers are paraphrasing a particular utterance. Moreover, we propose a novel technique to automatically translate APIs to canonical utterances. Finally, we present our platform to automatically generate bots out of API specifications. We also conduct thorough experiments to validate the proposed techniques and models

    Special Topics in Information Technology

    Get PDF
    This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the thirteen best theses defended in 2019-20 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    Forschungsbericht Universität Mannheim 2006 / 2007

    Full text link
    Sie erhalten darin zum einen zusammenfassende Darstellungen zu den Forschungsschwerpunkten und Forschungsprofilen der Universität und deren Entwicklung in der Forschung. Zum anderen gibt der Forschungsbericht einen Überblick über die Publikationen und Forschungsprojekte der Lehrstühle, Professuren und zentralen Forschungseinrichtungen. Diese werden ergänzt um Angaben zur Organisation von Forschungsveranstaltungen, der Mitwirkung in Forschungsausschüssen, einer Übersicht zu den für Forschungszwecke eingeworbenen Drittmitteln, zu den Promotionen und Habilitationen, zu Preisen und Ehrungen und zu Förderern der Universität Mannheim. Darin zeigt sich die Bandbreite und Vielseitigkeit der Forschungsaktivitäten und deren Erfolg auf nationaler und internationaler Ebene

    Bioinspired metaheuristic algorithms for global optimization

    Get PDF
    This paper presents concise comparison study of newly developed bioinspired algorithms for global optimization problems. Three different metaheuristic techniques, namely Accelerated Particle Swarm Optimization (APSO), Firefly Algorithm (FA), and Grey Wolf Optimizer (GWO) are investigated and implemented in Matlab environment. These methods are compared on four unimodal and multimodal nonlinear functions in order to find global optimum values. Computational results indicate that GWO outperforms other intelligent techniques, and that all aforementioned algorithms can be successfully used for optimization of continuous functions

    Experimental Evaluation of Growing and Pruning Hyper Basis Function Neural Networks Trained with Extended Information Filter

    Get PDF
    In this paper we test Extended Information Filter (EIF) for sequential training of Hyper Basis Function Neural Networks with growing and pruning ability (HBF-GP). The HBF neuron allows different scaling of input dimensions to provide better generalization property when dealing with complex nonlinear problems in engineering practice. The main intuition behind HBF is in generalization of Gaussian type of neuron that applies Mahalanobis-like distance as a distance metrics between input training sample and prototype vector. We exploit concept of neuron’s significance and allow growing and pruning of HBF neurons during sequential learning process. From engineer’s perspective, EIF is attractive for training of neural networks because it allows a designer to have scarce initial knowledge of the system/problem. Extensive experimental study shows that HBF neural network trained with EIF achieves same prediction error and compactness of network topology when compared to EKF, but without the need to know initial state uncertainty, which is its main advantage over EKF

    WICC 2017 : XIX Workshop de Investigadores en Ciencias de la ComputaciĂłn

    Get PDF
    Actas del XIX Workshop de Investigadores en Ciencias de la Computación (WICC 2017), realizado en el Instituto Tecnológico de Buenos Aires (ITBA), el 27 y 28 de abril de 2017.Red de Universidades con Carreras en Informática (RedUNCI
    corecore