21 research outputs found

    Cloudlet-based just-in-time indexing of IoT video

    Get PDF
    P

    Database architecture evolution: Mammals flourished long before dinosaurs became extinct

    Get PDF
    The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle. In this paper, we provide a trip report on this quest, covering both past experiences, ongoing research on hardware-conscious algorithms, and novel ways towards self-management specifically focused on column store solutions

    Efficient Data Flow Constraint Analysis

    Get PDF
    Aktuelle Entwicklungen in der Software-Technik zeigen einen Trend zur Dezentralisierung von Software-Systemen. Mit dem Einsatz von Techniken wie Cloud-Computing oder Micro-Services fließen immer mehr Daten über öffentliche Netzwerke oder über die Infrastruktur von Drittanbietern. Im Gegensatz dazu führen aktuelle gesetzliche Änderungen wie die Datenschutz-Grundverordnung dazu, dass es für Software-Entwickler immer wichtiger wird sicherzustellen, dass die Datenflüsse ihrer Software gesetzliche Beschränkungen einhalten. Um dies trotz der stetig wachsenden Komplexität von Software-Systemen zu ermöglichen wurden verschiedene modellbasierte Ansätze auf Architekturebene vorgeschlagen. Ein Nachteil der meisten Ansätze ist jedoch, dass sie oftmals keine voll automatisierte Analyse bezüglich der Verletzung von Datenflussbeschränkungen ermöglichen. Oft sind keine automatisierten Analysen möglich oder Analysen müssen individuell für jedes Szenario entwickelt werden. Aus diesem Grund schlagen wir ein neues Metamodell zur Beschreibung der Datenflüssen von Softwaresystemen vor. Dieses Metamodell ist so entworfen, dass eine automatisierte Übersetzung von Instanzen in ein Programm der logischen Programmiersprache Prolog ermöglicht wird. Dieses Programm erlaubt dann die einfache Formulierung von Regeln zur automatisierten Prüfung der Einhaltung von Datenflussbeschränkungen. Ein wichtiger Aspekt für den Entwurf und die Implementierung unseres Ansatzes ist die Skalierbarkeit: Ziel ist es, sicherzustellen dass unser Ansatz effizient einsetzbar ist. Hierbei haben wir insbesondere Techniken zur Optimierung von Prolog Programmen entwickelt, deren Einsatzmöglichkeiten nicht nur auf unseren Ansatz beschränkt sind. Desweiteren haben wir eine umfangreiche Evaluation unseres Ansatzes durchgeführt. Hierbei haben wir die Genauigkeit, Skalierbarkeit sowie die Generizität unseres Ansatzes untersucht. Wir haben gezeigt, dass unser Ansatz für mehrere Arten von Szenarien genau arbeitet und dabei eine gute Skalierbarkeit aufweist. Es hat sich herausgestellt, dass unsere vorgestellten Optimierungen in manchen Fällen sogar zu einer Reduktion von exponentieller zu konstanter Laufzeit führen können

    Data oriented design in video games

    Get PDF
    Object-Oriented Programming is the paradigm currently used in the video-game industry, and learnt by students or people wanting to become a video-game developer. Said methodology primary characteristics, and how they are used, can be considered its own flaws, often overlooked due to the rapid capability advances in hardware. This situation may not be considered sustainable, so in this thesis the Data-Oriented Design paradigm is presented as an alternative, offering a better hardware control resulting in a more efficient product. To test whether such a method is indeed more efficient, two projects have been developed with one paradigm each, using C/C++ in the Visual Studio environment. In them, the simple structure defined by each paradigm in order to have entities has been created. By defining a maximum of objects to simulate and a time limit, along with a time control based code insertion, the applications themselves derive update time metrics for analysis. Moreover, a profiler has been used to benchmark the L1 cache usage to check which of them makes a better usage of cache. The gathered data has been studied using RStudio, and the cache metrics have been presented, showing that Data-Oriented Design is indeed more efficient and cache friendly, becoming a great paradigm contender if given the chance, demonstrating being up to 70 times faster than ObjectOriented Programming in the case of study

    Query processing in peer-to-peer based data management system

    No full text
    Ph.DDOCTOR OF PHILOSOPH

    Algorithms and Data Structures for In-Memory Text Search Engines

    Get PDF

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies
    corecore