94 research outputs found

    A distributed workload-aware approach to partitioning geospatial big data for cybergis analytics

    Get PDF
    Numerous applications and scientific domains have contributed to tremendous growth of geospatial data during the past several decades. To resolve the volume and velocity of such big data, distributed system approaches have been extensively studied to partition data for scalable analytics and associated applications. However, previous work on partitioning large geospatial data focuses on bulk-ingestion and static partitioning, hence is unable to handle dynamic variability in both data and computation that are particularly common for streaming data. To eliminate this limitation, this thesis holistically addresses computational intensity and dynamic data workload to achieve optimal data partitioning for scalable geospatial applications. Specifically, novel data partitioning algorithms have been developed to support scalable geospatial and temporal data management with new data models designed to represent dynamic data workload. Optimal partitions are realized by formulating a fine-grain spatial optimization problem that is solved using an evolutionary algorithm with spatially explicit operations. As an overarching approach to integrating the algorithms, data models and spatial optimization problem solving, GeoBalance is established as a workload-aware framework for supporting scalable cyberGIS (i.e. geographic information science and systems based on advanced cyberinfrastructure) analytics

    Novel Architectures for Offloading and Accelerating Computations in Artificial Intelligence and Big Data

    Get PDF
    Due to the end of Moore's Law and Dennard Scaling, performance gains in general-purpose architectures have significantly slowed in recent years. While raising the number of cores has been a viable approach for further performance increases, Amdahl's Law and its implications on parallelization also limit further performance gains. Consequently, research has shifted towards different approaches, including domain-specific custom architectures tailored to specific workloads. This has led to a new golden age for computer architecture, as noted in the Turing Award Lecture by Hennessy and Patterson, which has spawned several new architectures and architectural advances specifically targeted at highly current workloads, including Machine Learning. This thesis introduces a hierarchy of architectural improvements ranging from minor incremental changes, such as High-Bandwidth Memory, to more complex architectural extensions that offload workloads from the general-purpose CPU towards more specialized accelerators. Finally, we introduce novel architectural paradigms, namely Near-Data or In-Network Processing, as the most complex architectural improvements. This cumulative dissertation then investigates several architectural improvements to accelerate Sum-Product Networks, a novel Machine Learning approach from the class of Probabilistic Graphical Models. Furthermore, we use these improvements as case studies to discuss the impact of novel architectures, showing that minor and major architectural changes can significantly increase performance in Machine Learning applications. In addition, this thesis presents recent works on Near-Data Processing, which introduces Smart Storage Devices as a novel architectural paradigm that is especially interesting in the context of Big Data. We discuss how Near-Data Processing can be applied to improve performance in different database settings by offloading database operations to smart storage devices. Offloading data-reductive operations, such as selections, reduces the amount of data transferred, thus improving performance and alleviating bandwidth-related bottlenecks. Using Near-Data Processing as a use-case, we also discuss how Machine Learning approaches, like Sum-Product Networks, can improve novel architectures. Specifically, we introduce an approach for offloading Cardinality Estimation using Sum-Product Networks that could enable more intelligent decision-making in smart storage devices. Overall, we show that Machine Learning can benefit from developing novel architectures while also showing that Machine Learning can be applied to improve the applications of novel architectures

    Towards a New Generation of Permissioned Blockchain Systems

    Get PDF
    With the release of Satoshi Nakamoto's Bitcoin system in 2008 a new decentralized computation paradigm, known as blockchain, was born. Bitcoin promised a trading network for virtual coins, publicly available for anyone to participate in but owned by nobody. Any participant could propose a transaction and a lottery mechanism decided in which order these transactions would be recorded in a ledger with an elegant mechanism to prevent double spending. The remarkable achievement of Nakamoto's protocol was that participants did not have to trust each other to behave correctly for it to work. As long as more than half of the network participants adhered to the correct code, the recorded transactions on the ledger would both be valid and immutable. Ethereum, as the next major blockchain to appear, improved on the initial idea by introducing smart contracts, which are decentralized Turing-complete stored procedures, thus making blockchain technology interesting for the enterprise setting. However, its intrinsically public data and prohibitive energy costs needed to be overcome. This gave rise to a new type of systems called permissioned blockchains. With these, access to the ledger is restricted and trust assumptions about malicious behaviour have been weakened, allowing more efficient consensus mechanisms to find a global order of transactions. One of the most popular representatives of this kind of blockchain is Hyperledger Fabric. While it is much faster and more energy efficient than permissionless blockchains, it has to compete with conventional distributed databases in the enterprise sector. This thesis aims to mitigate Fabric's three major shortcomings. First, compared to conventional database systems, it is still far too slow. This thesis shows how the performance can be increased by a factor of seven by redesigning the transaction processing pipeline and introducing more efficient data structures. Second, we present a novel solution to Fabric's intrinsic problem of a low throughput for workloads with transactions that access the same data. This is achieved by analyzing the dependencies of transactions and selectively re-executing transactions when a conflict is detected. Third, this thesis tackles the preservation of private data. Even though access to the blockchain as a whole can be restricted, in a setting where multiple enterprises collaborate this is not sufficient to protect sensitive proprietary data. Thus, this thesis introduces a new privacy-preserving blockchain protocol based on network sharding and targeted data dissemination. It also introduces an additional layer of abstraction for the creation of transactions and interaction with data on the blockchain. This allows developers to write applications without the need for low-level knowledge of the internal data structure of the blockchain system. In summary, this thesis addresses the shortcomings of the current generation of permission blockchain systems

    Big Data and Artificial Intelligence in Digital Finance

    Get PDF
    This open access book presents how cutting-edge digital technologies like Big Data, Machine Learning, Artificial Intelligence (AI), and Blockchain are set to disrupt the financial sector. The book illustrates how recent advances in these technologies facilitate banks, FinTech, and financial institutions to collect, process, analyze, and fully leverage the very large amounts of data that are nowadays produced and exchanged in the sector. To this end, the book also describes some more the most popular Big Data, AI and Blockchain applications in the sector, including novel applications in the areas of Know Your Customer (KYC), Personalized Wealth Management and Asset Management, Portfolio Risk Assessment, as well as variety of novel Usage-based Insurance applications based on Internet-of-Things data. Most of the presented applications have been developed, deployed and validated in real-life digital finance settings in the context of the European Commission funded INFINITECH project, which is a flagship innovation initiative for Big Data and AI in digital finance. This book is ideal for researchers and practitioners in Big Data, AI, banking and digital finance

    Big Data and Artificial Intelligence in Digital Finance

    Get PDF
    This open access book presents how cutting-edge digital technologies like Big Data, Machine Learning, Artificial Intelligence (AI), and Blockchain are set to disrupt the financial sector. The book illustrates how recent advances in these technologies facilitate banks, FinTech, and financial institutions to collect, process, analyze, and fully leverage the very large amounts of data that are nowadays produced and exchanged in the sector. To this end, the book also describes some more the most popular Big Data, AI and Blockchain applications in the sector, including novel applications in the areas of Know Your Customer (KYC), Personalized Wealth Management and Asset Management, Portfolio Risk Assessment, as well as variety of novel Usage-based Insurance applications based on Internet-of-Things data. Most of the presented applications have been developed, deployed and validated in real-life digital finance settings in the context of the European Commission funded INFINITECH project, which is a flagship innovation initiative for Big Data and AI in digital finance. This book is ideal for researchers and practitioners in Big Data, AI, banking and digital finance

    Forest and Rangeland Soils of the United States Under Changing Conditions

    Get PDF
    This open access book synthesizes leading-edge science and management information about forest and rangeland soils of the United States. It offers ways to better understand changing conditions and their impacts on soils, and explores directions that positively affect the future of forest and rangeland soil health. This book outlines soil processes and identifies the research needed to manage forest and rangeland soils in the United States. Chapters give an overview of the state of forest and rangeland soils research in the Nation, including multi-decadal programs (chapter 1), then summarizes various human-caused and natural impacts and their effects on soil carbon, hydrology, biogeochemistry, and biological diversity (chapters 2–5). Other chapters look at the effects of changing conditions on forest soils in wetland and urban settings (chapters 6–7). Impacts include: climate change, severe wildfires, invasive species, pests and diseases, pollution, and land use change. Chapter 8 considers approaches to maintaining or regaining forest and rangeland soil health in the face of these varied impacts. Mapping, monitoring, and data sharing are discussed in chapter 9 as ways to leverage scientific and human resources to address soil health at scales from the landscape to the individual parcel (monitoring networks, data sharing Web sites, and educational soils-centered programs are tabulated in appendix B). Chapter 10 highlights opportunities for deepening our understanding of soils and for sustaining long-term ecosystem health and appendix C summarizes research needs. Nine regional summaries (appendix A) offer a more detailed look at forest and rangeland soils in the United States and its Affiliates
    • …
    corecore