3 research outputs found

    D4M 3.0: Extended Database and Language Capabilities

    Full text link
    The D4M tool was developed to address many of today's data needs. This tool is used by hundreds of researchers to perform complex analytics on unstructured data. Over the past few years, the D4M toolbox has evolved to support connectivity with a variety of new database engines, including SciDB. D4M-Graphulo provides the ability to do graph analytics in the Apache Accumulo database. Finally, an implementation using the Julia programming language is also now available. In this article, we describe some of our latest additions to the D4M toolbox and our upcoming D4M 3.0 release. We show through benchmarking and scaling results that we can achieve fast SciDB ingest using the D4M-SciDB connector, that using Graphulo can enable graph algorithms on scales that can be memory limited, and that the Julia implementation of D4M achieves comparable performance or exceeds that of the existing MATLAB(R) implementation.Comment: IEEE HPEC 201

    Database Operations in D4M.jl

    Full text link
    Each step in the data analytics pipeline is important, including database ingest and query. The D4M-Accumulo database connector has allowed analysts to quickly and easily ingest to and query from Apache Accumulo using MATLAB(R)/GNU Octave syntax. D4M.jl, a Julia implementation of D4M, provides much of the functionality of the original D4M implementation to the Julia community. In this work, we extend D4M.jl to include many of the same database capabilities that the MATLAB(R)/GNU Octave implementation provides. Here we will describe the D4M.jl database connector, demonstrate how it can be used, and show that it has comparable or better performance to the original implementation in MATLAB(R)/GNU Octave.Comment: IEEE HPEC 2018. arXiv admin note: text overlap with arXiv:1708.0293

    Julia implementation of the Dynamic Distributed Dimensional Data Model

    Get PDF
    Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation
    corecore