4 research outputs found

    HADOOP-EDF: LARGE-SCALE DISTRIBUTED PROCESSING OF ELECTROPHYSIOLOGICAL SIGNAL DATA IN HADOOP MAPREDUCE

    Get PDF
    The rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing an analysis. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallel processable. We evaluate Hadoop-EDF’s scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster improves 27 times and 47 times than sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files

    Big data in epilepsy: Clinical and research considerations. Report from the Epilepsy Big Data Task Force of the International League Against Epilepsy

    Get PDF
    Epilepsy is a heterogeneous condition with disparate etiologies and phenotypic and genotypic characteristics. Clinical and research aspects are accordingly varied, ranging from epidemiological to molecular, spanning clinical trials and outcomes, gene and drug discovery, imaging, electroencephalography, pathology, epilepsy surgery, digital technologies, and numerous others. Epilepsy data are collected in the terabytes and petabytes, pushing the limits of current capabilities. Modern computing firepower and advances in machine and deep learning, pioneered in other diseases, open up exciting possibilities for epilepsy too. However, without carefully designed approaches to acquiring, standardizing, curating, and making available such data, there is a risk of failure. Thus, careful construction of relevant ontologies, with intimate stakeholder inputs, provides the requisite scaffolding for more ambitious big data undertakings, such as an epilepsy data commons. In this review, we assess the clinical and research epilepsy landscapes in the big data arena, current challenges, and future directions, and make the case for a systematic approach to epilepsy big data

    A Scalable Neuroinformatics Data Flow for Electrophysiological Signals using MapReduce

    No full text
    Data-driven neuroscience research is providing new insights in progression of neurological disorders and supporting the development of improved treatment approaches. However, the volume, velocity, and variety of neuroscience data from sophisticated recording instruments and acquisition methods have exacerbated the limited scalability of existing neuroinformatics tools. This makes it difficult for neuroscience researchers to effectively leverage the growing multi-modal neuroscience data to advance research in serious neurological disorders, such as epilepsy. We describe the development of the Cloudwave data flow that uses new data partitioning techniques to store and analyze electrophysiological signal in distributed computing infrastructure. The Cloudwave data flow uses MapReduce parallel programming algorithm to implement an integrated signal data processing pipeline that scales with large volume of data generated at high velocity. Using an epilepsy domain ontology together with an epilepsy focused extensible data representation format called Cloudwave Signal Format (CSF), the data flow addresses the challenge of data heterogeneity and is interoperable with existing neuroinformatics data representation formats, such as HDF5. The scalability of the Cloudwave data flow is evaluated using a 30-node cluster installed with the open source Hadoop software stack. The results demonstrate that the Cloudwave data flow can process increasing volume of signal data by leveraging Hadoop Data Nodes to reduce the total data processing time. The Cloudwave data flow is a template for developing highly scalable neuroscience data processing pipelines using MapReduce algorithms to support a variety of user applications
    corecore