24 research outputs found

    Enhancing health risk prediction with deep learning on big data and revised fusion node paradigm

    Get PDF
    With recent advances in health systems, the amount of health data is expanding rapidly in various formats. This data originates from many new sources including digital records, mobile devices, and wearable health devices. Big health data offers more opportunities for health data analysis and enhancement of health services via innovative approaches. The objective of this research is to develop a framework to enhance health prediction with the revised fusion node and deep learning paradigms. Fusion node is an information fusion model for constructing prediction systems. Deep learning involves the complex application of machine-learning algorithms, such as Bayesian fusions and neural network, for data extraction and logical inference. Deep learning, combined with information fusion paradigms, can be utilized to provide more comprehensive and reliable predictions from big health data. Based on the proposed framework, an experimental system is developed as an illustration for the framework implementatio

    Deep Learning on Big Data Sets in the Cloud with Apache Spark and Google TensorFlow

    Get PDF
    Machine learning is the branch of artificial intelligence giving computers the ability to learn patterns from data without being explicitly programmed. Deep Learning is a set of cutting-edge machine learning algorithms that are inspired by how the human brain works. It allows to selflearn feature hierarchies from the data rather than modeling hand-crafted features. It has proven to significantly improve performance in challenging data analytics problems. In this tutorial, we will first provide an introduction to the theoretical foundations of neural networks and Deep Learning. Second, we will demonstrate how to use Deep Learning in a cloud using a distributed environment for Big Data analytics. This combines Apache Spark and TensorFlow, Google’s in-house Deep Learning platform made for Big Data machine learning applications. Practical demonstrations will include character recognition and time series forecasting in Big Data sets. Attendees will be provided with code snippets that they can easily amend in order to analyze their own data. A related, but shorter tutorial focusing on Deep Learning on a single computer was given at the Data Science Luxembourg Meetup in April 2016. It was attended by 70 people making it the most attended event of this Meetup series in Luxembourg ever since its beginning

    Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

    Full text link
    TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this paper, we provide an in-depth performance characterization and analysis of these distributed training approaches on various GPU clusters including the Piz Daint system (6 on Top500). We perform experiments to gain novel insights along the following vectors: 1) Application-level scalability of DNN training, 2) Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on these experiments, we present two key insights: 1) Overall, No-gRPC designs achieve better performance compared to gRPC-based approaches for most configurations, and 2) The performance of No-gRPC is heavily influenced by the gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware MPI Allreduce design that exploits CUDA kernels and pointer caching to perform large reductions efficiently. Our proposed designs offer 5-17X better performance than NCCL2 for small and medium messages, and reduces latency by 29% for large messages. The proposed optimizations help Horovod-MPI to achieve approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs. Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint cluster.Comment: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-revie

    Framework for efficient transformation for complex medical data for improving analytical capability

    Get PDF
    The adoption of various technological advancement has been already adopted in the area of healthcare sector. This adoption facilitates involuntary generation of medical data that can be autonomously programmed to be forwarded to a destined hub in the form of cloud storage units. However, owing to such technologies there is massive formation of complex medical data that significantly acts as an overhead towards performing analytical operation as well as unwanted storage utilization. Therefore, the proposed system implements a novel transformation technique that is capable of using a template based stucture over cloud for generating structured data from highly unstructured data in a non-conventional manner. The contribution of the propsoed methodology is that it offers faster processing and storage optimization. The study outcome also proves this fact to show propsoed scheme excels better in performance in contrast to existing data transformation scheme

    An Object-Oriented Bayesian Framework for the Detection of Market Drivers

    Get PDF
    We use Object Oriented Bayesian Networks (OOBNs) to analyze complex ties in the equity market and to detect drivers for the Standard & Poor\u2019s 500 (S&P 500) index. To such aim, we consider a vast number of indicators drawn from various investment areas (Value, Growth, Sentiment, Momentum, and Technical Analysis), and, with the aid of OOBNs, we study the role they played along time in influencing the dynamics of the S&P 500. Our results highlight that the centrality of the indicators varies in time, and offer a starting point for further inquiries devoted to combine OOBNs with trading platforms

    SIGNIFICANCE OF ANALYTICS

    Get PDF
    Despite the number of times marketers have been using analytics as a tool to predict and drive consumer behavior, it is a relatively new application of the science. Technology continues to evolve and offers even more data choices and metrics for analysis, increasing the abilities of marketers to reach their audience. This article expands on several sectors of use, including real estate, social media, and healthcare, and theorizes the impact that analytics will have in the future as the technological means to interpret data catches up with the sheer amount of real-time information available for potential use, especially with the development of the Internet of Things, and rising concerns around data use, regarding data protection and copyright.  Article visualizations

    Boosting big data streaming applications in clouds with burstFlow

    Get PDF
    The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%

    Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

    Get PDF
    A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.N/

    Digital Pathology: The Time Is Now to Bridge the Gap between Medicine and Technological Singularity

    Get PDF
    Digitalization of the imaging in radiology is a reality in several healthcare institutions worldwide. The challenges of filing, confidentiality, and manipulation have been brilliantly solved in radiology. However, digitalization of hematoxylin- and eosin-stained routine histological slides has shown slow movement. Although the application for external quality assurance is a reality for a pathologist with most of the continuing medical education programs utilizing virtual microscopy, the abandonment of traditional glass slides for routine diagnostics is far from the perspectives of many departments of laboratory medicine and pathology. Digital pathology images are captured as images by scanning and whole slide imaging/virtual microscopy can be obtained by microscopy (robotic) on an entire histological (microscopic) glass slide. Since 1986, services using telepathology for the transfer of images of anatomic pathology between detached locations have benefited countless patients globally, including the University of Alberta. The purpose of specialist recertification or re-validation for the Royal College of Pathologists of Canada belonging to the Royal College of Physicians and Surgeons of Canada and College of American Pathologists is a milestone in virtual reality. Challenges, such as high bandwidth requirement, electronic platforms, the stability of the operating systems, have been targeted and are improving enormously. The encryption of digital images may be a requirement for the accreditation of laboratory services—quantum computing results in quantum-mechanical phenomena, such as superposition and entanglement. Different from binary digital electronic computers based on transistors where data are encoded into binary digits (bits) with two different states (0 and 1), quantum computing uses quantum bits (qubits), which can be in superpositions of states. The use of quantum computing protocols on encrypted data is crucial for the permanent implementation of virtual pathology in hospitals and universities. Quantum computing may well represent the technological singularity to create new classifications and taxonomic rules in medicine
    corecore