43 research outputs found

    Query execution in column-oriented database systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 145-148).There are two obvious ways to map a two-dimension relational database table onto a one-dimensional storage interface: store the table row-by-row, or store the table column-by-column. Historically, database system implementations and research have focused on the row-by row data layout, since it performs best on the most common application for database systems: business transactional data processing. However, there are a set of emerging applications for database systems for which the row-by-row layout performs poorly. These applications are more analytical in nature, whose goal is to read through the data to gain new insight and use it to drive decision making and planning. In this dissertation, we study the problem of poor performance of row-by-row data layout for these emerging applications, and evaluate the column-by-column data layout opportunity as a solution to this problem. There have been a variety of proposals in the literature for how to build a database system on top of column-by-column layout. These proposals have different levels of implementation effort, and have different performance characteristics. If one wanted to build a new database system that utilizes the column-by-column data layout, it is unclear which proposal to follow. This dissertation provides (to the best of our knowledge) the only detailed study of multiple implementation approaches of such systems, categorizing the different approaches into three broad categories, and evaluating the tradeoffs between approaches. We conclude that building a query executer specifically designed for the column-by-column query layout is essential to archive good performance. Consequently, we describe the implementation of C-Store, a new database system with a storage layer and query executer built for column-by-column data layout. We introduce three new query execution techniques that significantly improve performance. First, we look at the problem of integrating compression and execution so that the query executer is capable of directly operating on compressed data. This improves performance by improving I/O (less data needs to be read off disk), and CPU (the data need not be decompressed). We describe our solution to the problem of executer extensibility - how can new compression techniques be added to the system without having to rewrite the operator code? Second, we analyze the problem of tuple construction (stitching together attributes from multiple columns into a row-oriented "tuple").(cont.) Tuple construction is required when operators need to access multiple attributes from the same tuple; however, if done at the wrong point in a query plan, a significant performance penalty is paid. We introduce an analytical model and some heuristics to use that help decide when in a query plan tuple construction should occur. Third, we introduce a new join technique, the "invisible join" that improves performance of a specific type of join that is common in the applications for which column-by-column data layout is a good idea. Finally, we benchmark performance of the complete C-Store database system against other column-oriented database system implementation approaches, and against row-oriented databases. We benchmark two applications. The first application is a typical analytical application for which column-by-column data layout is known to outperform row-by-row data layout. The second application is another emerging application, the Semantic Web, for which column-oriented database systems are not currently used. We find that on the first application, the complete C-Store system performed 10 to 18 times faster than alternative column-store implementation approaches, and 6 to 12 times faster than a commercial database system that uses a row-by-row data layout. On the Semantic Web application, we find that C-Store outperforms other state-of-the-art data management techniques by an order of magnitude, and outperforms other common data management techniques by almost two orders of magnitude. Benchmark queries, which used to take multiple minutes to execute, can now be answered in several seconds.by Daniel J. Abadi.Ph.D

    Model-Based Time Series Management at Scale

    Get PDF

    Front Matter - Soft Computing for Data Mining Applications

    Get PDF
    Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic

    OctopusDB : flexible and scalable storage management for arbitrary database engines

    Get PDF
    We live in a dynamic age with the economy, the technology, and the people around us changing faster than ever before. Consequently, the data management needs in our modern world are much different than those envisioned by the early database inventors in the 70s. Today, enterprises face the challenge of managing ever-growing dataset sizes with dynamically changing query workloads. As a result, modern data managing systems, including relational as well as big data management systems, can no longer afford to be carved-in-stone solutions. Instead, data managing systems must inherently provide flexible data management techniques in order to cope with the constantly changing business needs. The current practice to deal with changing query workloads is to have a different specialized product for each workload type, e.g. row stores for OLTP workload, column stores for OLAP workload, streaming systems for streaming workload, and scan-oriented systems for shared query processing. However, this means that the enterprises have to now glue different data managing products together and copy data from one product to another, in order to support several query workloads. This has the additional penalty of managing a zoo of data managing systems in the first place, which is tedious, expensive, as well as counter-productive for modern enterprises. This thesis presents an alternative approach to supporting several query workloads in a data managing system. We observe that each specialized database product has a different data store, indicating that different query workloads work well with different data layouts. Therefore, a key requirement for supporting several query workloads is to support several data layouts. Therefore, in this thesis, we study ways to inject different data layouts into existing (and familiar) data managing systems. The goal is to develop a flexible storage layer which can support several query workloads in a single data managing system. We present a set of non-invasive techniques, coined Trojan Techniques, to inject different data layouts into a data managing system. The core idea of Trojan Techniques is to drop the assumption of having one fixed data store per data managing system. Trojan Techniques are non-invasive in the sense that they do not make heavy untenable changes to the system. Rather, they affect the data managing system from inside, almost at the core. As a result, Trojan Techniques bring significant improvements in query performance. It is interesting to note that in our approach we follow a design pattern that has been used in other non-invasive research works as well, such as PAX, fractal prefetching B+-trees, and RowCol. We propose four Trojan Techniques. First, Trojan Indexes add an additional index access path in Hadoop MapReduce. Second, Trojan Joins allow for co-partitioned joins in Hadoop MapReduce. Third, Trojan Layouts allow for row, column, or column-grouped layouts in Hadoop MapReduce. Together, these three techniques provide a highly flexible data storage layer for Hadoop MapReduce. Our final proposal, Trojan Columns, introduces columnar functionality in row-oriented relational databases, including closed source commercial databases, thus bridging the gap between row and column oriented databases. Our experimental results show that Trojan Techniques can improve the performance of Hadoop MapReduce by a factor of up to 18, and that of a top-notch commercial database product by a factor of up to 17.Wir leben in einer dynamischen Zeit, in der sich Wirtschaft, Technologie und Gesellschaft schneller verĂ€ndern als jemals zuvor. Folglich unterscheiden sich die Anforderungen an Datenverarbeitung heute sehr von dem, was sich die Pioniere dieses Forschungsgebiets in den 70er Jahren ursprĂŒnglich ausgemalt hatten. Heutzutage sehen sich Firmen mit der Herausforderung konfrontiert, stark fluktuierende Anfragelasten ĂŒber einer stetig wachsender Datenmengen zu bewĂ€ltigen. Daher können es sich moderne Datenbanksysteme, sowohl relationale als auch Big Data Systeme, nicht mehr leisten, wie starre, in Stein gemeißelte Lösungen zu funktionieren. Stattdessen sollten moderne Datenbanksysteme von Grunde auf fĂŒr flexible Datenverwaltung konzipiert werden, um mit sich stĂ€ndig Ă€ndernden Anforderungen Schritt halten zu können. Die gegenwĂ€rtige Praxis im Umgang mit hĂ€ufig wechselnden Anfragemustern besteht allerdings noch darin, jeweils unterschiedliche, spezialisierte Lösungen fĂŒr die verschiedenen Anfragetypen zu nutzen - zum Beispiel zeilenorientierte Systeme fĂŒr OLTP Anfragen, spaltenorientierte Systeme fĂŒr OLAP Anfragen, Data Stream Management Systeme fĂŒr kontinuierliche Datenströme und Scan-basierte Systeme fĂŒr die Bearbeitung von vielen gleichzeitigen Anfragen. Leider setzt dieses Vorgehen aber voraus, dass die Unternehmen es schaffen die verschiedensten Systeme irgendwie miteinander zu verknĂŒpfen und einen Datenaustausch zwischen ihnen zu gewĂ€hrleisten. Ein zusĂ€tzlicher Nachteil ist, dass hierbei oft ein ganzes Sortiment von Datenbankprodukten eingerichtet und gepflegt werden muss, was sowohl zeit- als auch kostenintensiv und damit letztlich aufwendig ist. Diese Dissertation prĂ€sentiert eine alternative Lösung, um wechselnde Anfragemuster effizient mit einem einzigen Datenverwaltungssystem zu unterstĂŒtzen. Aus der Beobachtung, dass jedes spezielle Datenbankprodukt unterschiedliche AnsĂ€tze zur Datenspeicherung nutzt, folgern wir, dass verschiedene Anfragen jeweils auf bestimmten Datenlayouts effizienter beantwortet werden können als auf anderen. Deshalb ist eine zentrale Anforderung zur effizienten Verarbeitung unterschiedlicher Anfragetypen mit nur einem System, dass dieses System verschiedene Datenlayouts unterstĂŒtzen muss. Dazu untersuchen wir in dieser Arbeit Möglichkeiten, um verschiedene Datenlayouts nachtrĂ€glich in bestehende (und bekannte) Datenbanksysteme einzuschleusen. Das Ziel hierbei ist die Entwicklung einer flexiblen Speicherschicht, die verschiedenste Anfragen in einem einzigen Datenbanksystem unterstĂŒtzen kann. Wir haben hierzu eine Reihe von nichtinvasiven Techniken, auch Trojanische Techniken genannt, entwickelt, mit denen sich verschiedene Datenlayouts nachtrĂ€glich in existierende Systeme einschleusen lassen. Die Grundidee hinter diesen Trojanischen Techniken ist es, die Annahme, dass jedes Datenbanksystem nur eine festgelegte Art der Datenspeicherung haben kann, fallen zu lassen. Die Trojanischen Techniken erfordern nur minimale Änderungen am ursprĂŒnglichen Datenbanksystem, sondern beeinflussen dessen Verhalten von innen heraus. Der Einsatz Trojanischen Techniken kann die Anfragegeschwindigkeit erheblich steigern. Wir folgen mit diesem Ansatz einem Entwurfsmuster, das auch in anderen nichtinvasiven Forschungsprojekten wie PAX, fpB+-BĂ€ume und RowCol verwendet wurde. Wir stellen in dieser Arbeit vier verschiedene Trojanische Techniken vor. Als erstes zeigen wir, wie Trojanische Indexe die Integration eines Index in Hadoop MapReduce ermöglichen. ErgĂ€nzt wird dies durch Trojanische Joins, welche kopartitionierte Joins in Hadoop MapReduce ermöglichen. Danach zeigen wir, wie Trojanische Layouts Hadoop MapReduce um zeilen-, spalten- und gruppierte spaltenorientierte Datenlayouts erweitern. Zusammen bilden diese Techniken eine flexible Speicherschicht fĂŒr das Hadoop MapReduce Framework. Unsere vierte Technik, Trojanische Spalten, erlaubt es uns, spaltenorientierte Datenverarbeitung nachtrĂ€glich in zeilenbasierten Datenbanksysteme einzufĂŒhren und lĂ€sst sich sogar auf kommerzielle closed-source Produkten anwenden. Wir schließen damit die LĂŒcke zwischen zeilen- und spaltenorientierten Datenbanksystemen. In unseren Experimenten zeigen wir, dass die Trojanischen Techniken die Leistung des Hadoop MapReduce Frameworks um das bis zu 18fache und die Geschwindigkeit einer aktuellen kommerziellen Datenbank um das 17fache erhöhen können

    Spatiotemporal enabled Content-based Image Retrieval

    Full text link

    Rethinking FPGA Architectures for Deep Neural Network applications

    Get PDF
    The prominence of machine learning-powered solutions instituted an unprecedented trend of integration into virtually all applications with a broad range of deployment constraints from tiny embedded systems to large-scale warehouse computing machines. While recent research confirms the edges of using contemporary FPGAs to deploy or accelerate machine learning applications, especially where the latency and energy consumption are strictly limited, their pre-machine learning optimised architectures remain a barrier to the overall efficiency and performance. Realizing this shortcoming, this thesis demonstrates an architectural study aiming at solutions that enable hidden potentials in the FPGA technology, primarily for machine learning algorithms. Particularly, it shows how slight alterations to the state-of-the-art architectures could significantly enhance the FPGAs toward becoming more machine learning-friendly while maintaining the near-promised performance for the rest of the applications. Eventually, it presents a novel systematic approach to deriving new block architectures guided by designing limitations and machine learning algorithm characteristics through benchmarking. First, through three modifications to Xilinx DSP48E2 blocks, an enhanced digital signal processing (DSP) block for important computations in embedded deep neural network (DNN) accelerators is described. Then, two tiers of modifications to FPGA logic cell architecture are explained that deliver a variety of performance and utilisation benefits with only minor area overheads. Eventually, with the goal of exploring this new design space in a methodical manner, a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations is first proposed. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then suggested together with a family of new embedded blocks, called MLBlocks

    Understanding the Legal Construct Regulating Government Intervention into City Decline and Degeneration in America

    Get PDF
    An overview of critical academic thought concerning the character and attributes of American urban development establishes that the presence of unsuccessful, or challenged, development is a transcending problem necessitating government regulation in response. Challenged developments were observed frequently materializing in areas exhibiting urban decline and degeneration, including outward migration. It was conjectured that this cycle of outward migration and urban decline and degeneration might be part of an overall development cycle experienced by more than current day cities. History was probed for evidence of commonality. Cycles of urban decline and degeneration appeared within Mesopotamia, Egypt, the Greek city-states, and the Roman Empire. The form of government, whether a benevolent priest-king, dictator, democratic assembly or republic council appears extraneous. The mere presence of governmental regulation, such as comprehensive planning, zoning, building codes, advanced development techniques or sophisticated legal concepts for the protection of individual rights, did not purport to dissuade or ameliorate these cycles throughout the ages. Historical accounts attributed successful urban concentration to the presence of safety and security, convenience, and quality of life. Conversely, when one or more of these factors were diminished or compromised, cycles of urban decline and degeneration seemed to emerge. Field research was conducted to ascertain how these historical observations fared in the modern context. Residential and commercial developments differentiated as successful and challenged within the fifty (50) fastest growing counties across the United States between 2000 and 2010 pursuant to the U.S. Census Bureau were surveyed to explore the presence of governmental regulation and procedures as well as factors affecting safety and security, convenience, and quality of life. Consistent with historical observations, only items connected with safety and security, convenience and quality of life emerged from this process. Based upon this knowledge, local governments may be prompted to intervene at the development stage of residential and commercial developments in an attempt to counter, forestall or at least lessen the impact of the cycle of outward migration and urban decline and degeneration. While this could be attempted ad hoc, a more prudent approach might be to re-examine and re-constitute existing zoning, subdivision and development regulations and procedures in light of the differential characteristics between successful verses challenged developments. However, such an undertaking does not happen in a legal "state of nature." A synthesis of the jurisprudence that defines the limits of and restraints upon current governmental regulation reveals that land use regulation in America centers around the interaction between the authority of a local government to act, pursuant to "police power" authority granted that local government from the state, and whether that government action violates an individual's Constitutional rights. These Constitutional rights center around the privileges and immunities of citizens, equal protections of the laws and due process clauses of the Fourteenth Amendment and include "regulatory takings" under the theory of inverse condemnation. The United States Supreme Court has undertaken the long and arduous task of defining this interaction. A summation of that current definition is contained in Arkansas Game and Fish Comm'n v. United States where the Court expounded that when regulation or temporary physical invasion by government interferes with private property, time is a factor in determining the existence of a compensable taking. Also relevant is the degree to which the invasion is intended or is the foreseeable result of authorized government action. So too, is the character of the land at issue and the owner's "reasonable investment-backed expectations" regarding the land's use. Severity of the interference figures in the calculus as well. While a single act may not be enough, a continuance of them in sufficient number and for a sufficient time may prove a taking. Every successive trespass adds to the force of the evidence. This current understanding of the interaction between the exercise of government regulation and takings jurisprudence lays the groundwork for thoughtful and legally permissible implementation and application of zoning, subdivision and developmental regulations and processes aimed at addressing the cycle of outward migration and urban decline and degeneration at the initial development stage as well as subsequently thereto

    On The Corporeal Exchange: Thai Boxing's Sacrificial Movement

    Get PDF
    This dissertation is an ethnographic study of Thai boxing (muay Thai) understood as sacrificial exchange, exploring the practice of this martial art in the context of contemporary Thai society. Drawing on two years of apprenticeship and participation research in Northeast Thailand and Bangkok, I consider the fighters’ integration in broader patterns of seasonal labor migration as they move between rural, regional tournaments and Bangkok stadiums. Focusing on the training of one particular boxer, I investigate interactions between trainers, managers, family, patrons and ancestral spirits. The boxers’ embodied actions as they unfold in time represent the sovereign relationship between living and dead, nature and culture, performatively establishing the boundaries between growth and decay. As the living move through a world of animate social relations, accruing debt, the boxer’s embodied patterns of repetition and exhaustion in training, and of destructive action in combat, create a possibility for shifting this balance, accruing merit for those otherwise occupied in handling materials which support the powerful, and transforming the established hierarchical order of everyday life. Against the background of the impermanent, closed, linear, cyclical or progressive temporalities of monasteries, factories, the military and the monarchy, the temporality of the ring remains open, giving fighters the elbow-room to performatively engage crucial symbols of life and death, male and female, human and animal, affording otherwise politically disempowered Northeastern Thai families the opportunity to create meaning and possibility in their lives. Acting as both victim and executioner, fighters accrue credit for the assembled audience, reinvesting each tier of the community with a degree of responsibility for life. I argue that these practices occur within a ‘deathworld’, in which the heightened attentiveness to the limited possibilities for action reaffirm the local position of the individual within the collective. With embodied motion that cuts across local categories of stillness and mobility, the living and the dead, with ever-greater stamina, Thai boxers become increasingly valuable and credit-able, paying the debts, material and spiritual, that their assembled supporters have incurred as they live their kinetically excessive lives, allowing men throughout the community to remain accountable to Kings, Buddha, ancestors, factories and patrons.Doctor of Philosoph

    Deconstructing a biofuel hype : the stories of jatropha projects in South Sulawesi, Indonesia

    Get PDF
    This research took place in South Sulawesi in order to investigate the implementation of jatropha projects in the period of 2006-2011. This research aims to understand the key factors that were influential in the rise and fall of jatropha projects. The analysis was focused on jatropha actors’ motivations, strategies and experiences to understand what opportunities and benefits that were pursued by the involved actors and how the achievements of the opportunities and benefits redefine the failure of the projects. The findings were synthesized to draw a lesson learnt on what we can learn from the observed jatropha stories for the other miracle crops. The Netherlands Royal Academy of Sciences (KNAW) and the Netherlands Organization for Scientific Research (NWO)Global Challenges (FSW

    An aesthetic for sustainable interactions in product-service systems?

    Get PDF
    Copyright @ 2012 Greenleaf PublishingEco-efficient Product-Service System (PSS) innovations represent a promising approach to sustainability. However the application of this concept is still very limited because its implementation and diffusion is hindered by several barriers (cultural, corporate and regulative ones). The paper investigates the barriers that affect the attractiveness and acceptation of eco-efficient PSS alternatives, and opens the debate on the aesthetic of eco-efficient PSS, and the way in which aesthetic could enhance some specific inner qualities of this kinds of innovations. Integrating insights from semiotics, the paper outlines some first research hypothesis on how the aesthetic elements of an eco-efficient PSS could facilitate user attraction, acceptation and satisfaction
    corecore