Search CORE

4,776 research outputs found

Big Data Computing Using Cloud-Based Technologies, Challenges and Future Perspectives

Author: Alam Mansaf
Khan Samiya
Shakil Kashish Ara
Publication venue
Publication date: 24/11/2017
Field of study

The excessive amounts of data generated by devices and Internet-based sources at a regular basis constitute, big data. This data can be processed and analyzed to develop useful applications for specific domains. Several mathematical and data analytics techniques have found use in this sphere. This has given rise to the development of computing models and tools for big data computing. However, the storage and processing requirements are overwhelming for traditional systems and technologies. Therefore, there is a need for infrastructures that can adjust the storage and processing capability in accordance with the changing data dimensions. Cloud Computing serves as a potential solution to this problem. However, big data computing in the cloud has its own set of challenges and research issues. This chapter surveys the big data concept, discusses the mathematical and data analytics techniques that can be used for big data and gives taxonomy of the existing tools, frameworks and platforms available for different big data computing models. Besides this, it also evaluates the viability of cloud-based big data computing, examines existing challenges and opportunities, and provides future research directions in this field

arXiv.org e-Print Archive

A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors

Author: Devetsikiotis Michael
Papapanagiotou Ioannis
Rimal Bhaskar Prasad
Sikeridis Dimitrios
Publication venue
Publication date: 28/01/2018
Field of study

An increasing number of technology enterprises are adopting cloud-native architectures to offer their web-based products, by moving away from privately-owned data-centers and relying exclusively on cloud service providers. As a result, cloud vendors have lately increased, along with the estimated annual revenue they share. However, in the process of selecting a provider's cloud service over the competition, we observe a lack of universal common ground in terms of terminology, functionality of services and billing models. This is an important gap especially under the new reality of the industry where each cloud provider has moved towards his own service taxonomy, while the number of specialized services has grown exponentially. This work discusses cloud services offered by four dominant, in terms of their current market share, cloud vendors. We provide a taxonomy of their services and sub-services that designates major service families namely computing, storage, databases, analytics, data pipelines, machine learning, and networking. The aim of such clustering is to indicate similarities, common design approaches and functional differences of the offered services. The outcomes are essential both for individual researchers, and bigger enterprises in their attempt to identify the set of cloud services that will utterly meet their needs without compromises. While we acknowledge the fact that this is a dynamic industry, where new services arise constantly, and old ones experience important updates, this study paints a solid image of the current offerings and gives prominence to the directions that cloud service providers are following

arXiv.org e-Print Archive

Analytics for the Internet of Things: A Survey

Author: Hall Wendy
Siow Eugene
Tiropanis Thanassis
Publication venue
Publication date: 03/07/2018
Field of study

The Internet of Things (IoT) envisions a world-wide, interconnected network of smart physical entities. These physical entities generate a large amount of data in operation and as the IoT gains momentum in terms of deployment, the combined scale of those data seems destined to continue to grow. Increasingly, applications for the IoT involve analytics. Data analytics is the process of deriving knowledge from data, generating value like actionable insights from them. This article reviews work in the IoT and big data analytics from the perspective of their utility in creating efficient, effective and innovative applications and services for a wide spectrum of domains. We review the broad vision for the IoT as it is shaped in various communities, examine the application of data analytics across IoT domains, provide a categorisation of analytic approaches and propose a layered taxonomy from IoT data to analytics. This taxonomy provides us with insights on the appropriateness of analytical techniques, which in turn shapes a survey of enabling technology and infrastructure for IoT analytics. Finally, we look at some tradeoffs for analytics in the IoT that can shape future research

arXiv.org e-Print Archive

The Role of Big Data Analytics in Industrial Internet of Things

Author: Imran Muhammad
Jayaraman Prem Prakash
Perera Charith
Rehman Muhammad Habib ur
Salah Khaled
Yaqoob Ibrar
Publication venue
Publication date: 11/04/2019
Field of study

Big data production in industrial Internet of Things (IIoT) is evident due to the massive deployment of sensors and Internet of Things (IoT) devices. However, big data processing is challenging due to limited computational, networking and storage resources at IoT device-end. Big data analytics (BDA) is expected to provide operational- and customer-level intelligence in IIoT systems. Although numerous studies on IIoT and BDA exist, only a few studies have explored the convergence of the two paradigms. In this study, we investigate the recent BDA technologies, algorithms and techniques that can lead to the development of intelligent IIoT systems. We devise a taxonomy by classifying and categorising the literature on the basis of important parameters (e.g. data sources, analytics tools, analytics techniques, requirements, industrial analytics applications and analytics types). We present the frameworks and case studies of the various enterprises that have benefited from BDA. We also enumerate the considerable opportunities introduced by BDA in IIoT.We identify and discuss the indispensable challenges that remain to be addressed as future research directions as well

arXiv.org e-Print Archive

Recommended from our members

Computational Strategies for Scalable Genomics Analysis.

Author: Shi Lizhen
Wang Zhong
Publication venue: eScholarship, University of California
Publication date: 06/12/2019
Field of study

The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications

eScholarship - University of California

Role of Apache Software Foundation in Big Data Projects

Author: Akhtar Aleem
Publication venue
Publication date: 05/05/2020
Field of study

With the increase in amount of Big Data being generated each year, tools and technologies developed and used for the purpose of storing, processing and analyzing Big Data has also improved. Open-Source software has been an important factor in the success and innovation in the field of Big Data while Apache Software Foundation (ASF) has played a crucial role in this success and innovation by providing a number of state-of-the-art projects, free and open to the public. ASF has classified its project in different categories. In this report, projects listed under Big Data category are deeply analyzed and discussed with reference to one-of-the seven sub-categories defined. Our investigation has shown that many of the Apache Big Data projects are autonomous but some are built based on other Apache projects and some work in conjunction with other projects to improve and ease development in Big Data space

arXiv.org e-Print Archive

ECHO: An Adaptive Orchestration Platform for Hybrid Dataflows across Cloud and Edge

Author: Khochare Aakash
Ravindra Pushkara
Reddy Siva Prakash
Sharma Sarthak
Simmhan Yogesh
Varshney Prateeksha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/07/2017
Field of study

The Internet of Things (IoT) is offering unprecedented observational data that are used for managing Smart City utilities. Edge and Fog gateway devices are an integral part of IoT deployments to acquire real-time data and enact controls. Recently, Edge-computing is emerging as first-class paradigm to complement Cloud-centric analytics. But a key limitation is the lack of a platform-as-a-service for applications spanning Edge and Cloud. Here, we propose ECHO, an orchestration platform for dataflows across distributed resources. ECHO's hybrid dataflow composition can operate on diverse data models -- streams, micro-batches and files, and interface with native runtime engines like TensorFlow and Storm to execute them. It manages the application's lifecycle, including container-based deployment and a registry for state management. ECHO can schedule the dataflow on different Edge, Fog and Cloud resources, and also perform dynamic task migration between resources. We validate the ECHO platform for executing video analytics and sensor streams for Smart Traffic and Smart Utility applications on Raspberry Pi, NVidia TX1, ARM64 and Azure Cloud VM resources, and present our results.Comment: 17 pages, 5 figures, 2 tables, submitted to ICSOC-201

arXiv.org e-Print Archive

A deep learning based solution for construction equipment detection: from development to deployment

Author: Arabi Saeed
Haghighat Arya
Sharma Anuj
Publication venue
Publication date: 18/04/2019
Field of study

This paper aims at providing researchers and engineering professionals with a practical and comprehensive deep learning based solution to detect construction equipment from the very first step of its development to the last one which is deployment. This paper focuses on the last step of deployment. The first phase of solution development, involved data preparation, model selection, model training, and model evaluation. The second phase of the study comprises of model optimization, application specific embedded system selection, and economic analysis. Several embedded systems were proposed and compared. The review of the results confirms superior real-time performance of the solutions with a consistent above 90% rate of accuracy. The current study validates the practicality of deep learning based object detection solutions for construction scenarios. Moreover, the detailed knowledge, presented in this study, can be employed for several purposes such as, safety monitoring, productivity assessments, and managerial decisions.Comment: 17 pages, 16 figures, 6 table

arXiv.org e-Print Archive

A Berkeley View of Systems Challenges for AI

Author: Abbeel Pieter
Culler David
Ghodsi Ali
Goldberg Ken
Gonzalez Joseph E.
Hellerstein Joseph M.
Jordan Michael
Joseph Anthony D.
Katz Randy
Mahoney Michael W.
Patterson David
Popa Raluca Ada
Song Dawn
Stoica Ion
Publication venue
Publication date: 15/12/2017
Field of study

With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by methodological advances in machine learning, by innovations in systems software and architectures, and by the broad accessibility of these technologies. The next generation of AI systems promises to accelerate these developments and increasingly impact our lives via frequent interactions and making (often mission-critical) decisions on our behalf, often in highly personalized contexts. Realizing this promise, however, raises daunting challenges. In particular, we need AI systems that make timely and safe decisions in unpredictable environments, that are robust against sophisticated adversaries, and that can process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore's Law, which will constrain the amount of data these technologies can store and process. In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI's potential to improve lives and society.Comment: Berkeley Technical Repor

arXiv.org e-Print Archive

Attacking Machine Learning models as part of a cyber kill chain

Author: Nguyen Tam N.
Publication venue
Publication date: 06/04/2018
Field of study

Machine learning is gaining popularity in the network security domain as many more network-enabled devices get connected, as malicious activities become stealthier, and as new technologies like Software Defined Networking emerge. Compromising machine learning model is a desirable goal. In fact, spammers have been quite successful getting through machine learning enabled spam filters for years. While previous works have been done on adversarial machine learning, none has been considered within a defense-in-depth environment, in which correct classification alone may not be good enough. For the first time, this paper proposes a cyber kill-chain for attacking machine learning models together with a proof of concept. The intention is to provide a high level attack model that inspire more secure processes in research/design/implementation of machine learning based security solutions.Comment: 8 page

arXiv.org e-Print Archive