    Expandable open addressing hash table storage and retrieval

    Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks

    Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There have been numerous efforts to improve the performance of the data analytics frameworks in- cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have various target workloads and their own characteristicshowever, there is common ground as a data analytics framework. Emerging hardware such as graphics processing units and persistent memory is expected to open up new opportunities for such commonality. The goal of this dis- sertation is to leverage emerging hardware to improve the performance of the data analytics frameworks. First, we design and implement EclipseMR, a novel MapReduce framework that efficiently leverages an extensive amount of memory space distributed among the machines in a cluster. EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer. The in-memory cache layer is designed to store both local and remote data while balancing the load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a primary storage layer or cache layer, or it can adopt GPU to improve the performance of map and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for various applications. Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis- tent memory. The fundamental challenge to design a data structure for the persistent memory is to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost. B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits from the strength of both in-memory index and block-based index, and CCEH is a variant of extendible hashing that introduces an intermediate layer between directory and buckets to fully benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data structures show better performance than the corresponding state-of-the-art techniques. Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed to improve the performance of the query by leveraging GPU as an acceleratorhowever, most of the works focus on the brute-force algorithms. In this work, we overcome the challenges of traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared memory and runtime stack, irregular memory access pattern, and warp divergence problem. Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force algorithm and the traditional branch and bound algorithm.clos

    A parameterized model for selecting the optimum file organization in multi-attribute retrieval systems.

    Massachusetts Institute of Technology, Alfred P. Sloan School of Management. Thesis. 1974. M.S.MICROFICHE COPY ALSO AVAILABLE IN DEWEY LIBRARY.Bibliography: leaves 135-142.M.S

    Decoupling Information and Connectivity via Information-Centric Transport

    The power of Information-Centric Networking architectures (ICNs) lies in their abstraction for communication --- the request for named data. This abstraction was popularized by the HyperText Transfer Protocol (HTTP) as an application-layer abstraction, and was extended by ICNs to also serve as their network-layer abstraction. In recent years, network mechanisms for ICNs, such as scalable name-based forwarding, named-data routing and in-network caching, have been widely explored and researched. However, to the best of our knowledge, the impact of this network abstraction on ICN applications has not been explored or well understood. The motivation of this dissertation is to address this research gap. Presumably, shifting from the IP\u27s channel abstraction, in which two endpoints must establish a channel to communicate, to the request for named data abstraction in ICNs, should simplify application mechanisms. This is not only because those mechanisms are no longer required to translate named-based requests to addresses of endpoints, but mainly because application mechanisms are no longer coupled with the connectivity characteristics of the channel. Hence, applications do not need to worry if there is a synchronous end-to-end path between two endpoints, or if a device along the path switches between concurrent interfaces for communication. Therefore, ICN architectures present a new and powerful promise to applications --- the freedom to stay in the information plane decoupled from connectivity. This dissertation shows that despite this powerful promise, the information and connectivity planes are presently coupled in today\u27s incarnations of leading ICNs by a core architectural component, the forwarding strategy. Therefore, this dissertation defines the role of forwarding strategies, and it introduces Information-Centric Transport (ICT) as a new architectural component that application developers can rely on if they want their application to be decoupled from connectivity. When discussing the role of ICT, we explain the importance of in-network transport mechanisms in ICNs, and we explore how those mechanisms can be scalable when generalized to provide broadly-applicable application needs. To illustrate our contribution concretely, we present three group communication abstractions that can evolve into ICTs: 1) Data synchronization of named data. This abstraction supports applications that want to maintain data consistency over time of a group\u27s shared dataset. 2) Push-like notifications for the latest named data. This abstraction supports applications that want to quickly notify and be notified about the latest content that was produced by a member(s) in the group. And 3) distributed named data fetching when the content is partitioned. This abstraction supports applications that their named data is partitioned and distributed in the group, and the names of content items in a partition cannot be generalized and hierarchically represented using one partition name. For each ICT, we provide examples of known applications that can use it, we discuss different mechanisms for implementation, and we evaluate selected implementations. We show how by relying on an ICT instead of a forwarding strategy, the tested applications can maintain sustainable communication in connectivities where IP tools fail or do not work well

    Организация баз данных

    Опис дисципліни. Дисципліна присвячена вивченню теоретичних основ, практичних методів і засобів побудови баз даних, а також питань, пов'язаних з життєвим циклом, підтримкою і супроводом баз даних. Розглядаються основні поняття баз даних, способи їх класифікації, принципи організації структур даних і відповідні їм типи систем управління базами даних (СУБД). Детально вивчається реляційна модель даних, теорія нормалізації та СУБД, що відповідають цій моделі (на прикладі СУБД MS SQL Server), стандартна мова запитів до реляційних СУБД - SQL, методи представлення складних структур даних засобами реляційної СУБД. Розглядаються питання організації колективного доступу до даних, вводяться поняття посилальної цілісності і семантичної цілісності даних, транзакцій і пов'язані з ними проблеми і методи їх вирішення. Розглядаються питання збереження і безпеки даних, методи резервного копіювання та стиснення даних. Дається огляд ієрархічних, нереляційних і постреляціонних, об'єктно-орієнтованих, повнотекстових, мережевих і розподілених СУБД. Вивчається побудова ER-моделі засобами Entity Framework Visual Studio, створення додатка для роботи з базами даних в середовищі розробки Visual Studio на мові С #.Анотація дисципліни «Організація баз даних». Метою викладання дисципліни є формування у студентів розуміння ролі автоматизованих банків даних в створенні інформаційних систем. Завданнями вивчення дисципліни є: вивчення моделей даних, які підтримуються різними системами управління базами даних (СУБД); вивчення нереляційних моделей; вивчення елементів теорії реляційних баз даних; знайомство з принципами побудови СУБД; вивчення розподілених СУБД і засобів розробки додатків для цих СУБД.Abstract "Database Organization" discipline. The purpose of teaching is to develop students' understanding the role of automated data banks in the creation of information systems. The objectives of the discipline are: study data models supported by different database management systems (DBMS); the study of non-relational models, the theory of relational databases, the principles of creating a database, the distributed database and application development tools for these databases.Аннотация дисциплины «Организация баз данных». Целью преподавания дисциплины является формирование у студентов понимания роли автоматизированных банков данных в создании информационных систем. Задачами изучения дисциплины являются: изучение моделей данных, поддерживаемых различными системами управления базами данных (СУБД); изучение нереляционных моделей; изучение элементов теории реляционных баз данных; знакомство с принципами построения СУБД; изучение распределенных СУБД и средств разработки приложений для этих СУБД