Search CORE

514 research outputs found

Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

Author: Awaysheh Feras Mahmoud Naji
Publication venue
Publication date: 01/01/2020
Field of study

One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

Repositorio Institucional da Universidade de Santiago de Compostela

Blockchain for secured IoT and D2D applications over 5G cellular networks : a thesis by publications presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer and Electronics Engineering, Massey University, Albany, New Zealand

Author: Honar Pajooh Houshyar
Publication venue: 'Massey University'
Publication date: 01/01/2021
Field of study

Author's Declaration: "In accordance with Sensors, SpringerOpen, and IEEE’s copyright policy, this thesis contains the accepted and published version of each manuscript as the final version. Consequently, the content is identical to the published versions."The Internet of things (IoT) is in continuous development with ever-growing popularity. It brings significant benefits through enabling humans and the physical world to interact using various technologies from small sensors to cloud computing. IoT devices and networks are appealing targets of various cyber attacks and can be hampered by malicious intervening attackers if the IoT is not appropriately protected. However, IoT security and privacy remain a major challenge due to characteristics of the IoT, such as heterogeneity, scalability, nature of the data, and operation in open environments. Moreover, many existing cloud-based solutions for IoT security rely on central remote servers over vulnerable Internet connections. The decentralized and distributed nature of blockchain technology has attracted significant attention as a suitable solution to tackle the security and privacy concerns of the IoT and device-to-device (D2D) communication. This thesis explores the possible adoption of blockchain technology to address the security and privacy challenges of the IoT under the 5G cellular system. This thesis makes four novel contributions. First, a Multi-layer Blockchain Security (MBS) model is proposed to protect IoT networks while simplifying the implementation of blockchain technology. The concept of clustering is utilized to facilitate multi-layer architecture deployment and increase scalability. The K-unknown clusters are formed within the IoT network by applying a hybrid Evolutionary Computation Algorithm using Simulated Annealing (SA) and Genetic Algorithms (GA) to structure the overlay nodes. The open-source Hyperledger Fabric (HLF) Blockchain platform is deployed for the proposed model development. Base stations adopt a global blockchain approach to communicate with each other securely. The quantitative arguments demonstrate that the proposed clustering algorithm performs well when compared to the earlier reported methods. The proposed lightweight blockchain model is also better suited to balance network latency and throughput compared to a traditional global blockchain. Next, a model is proposed to integrate IoT systems and blockchain by implementing the permissioned blockchain Hyperledger Fabric. The security of the edge computing devices is provided by employing a local authentication process. A lightweight mutual authentication and authorization solution is proposed to ensure the security of tiny IoT devices within the ecosystem. In addition, the proposed model provides traceability for the data generated by the IoT devices. The performance of the proposed model is validated with practical implementation by measuring performance metrics such as transaction throughput and latency, resource consumption, and network use. The results indicate that the proposed platform with the HLF implementation is promising for the security of resource-constrained IoT devices and is scalable for deployment in various IoT scenarios. Despite the increasing development of blockchain platforms, there is still no comprehensive method for adopting blockchain technology on IoT systems due to the blockchain's limited capability to process substantial transaction requests from a massive number of IoT devices. The Fabric comprises various components such as smart contracts, peers, endorsers, validators, committers, and Orderers. A comprehensive empirical model is proposed that measures HLF's performance and identifies potential performance bottlenecks to better meet blockchain-based IoT applications' requirements. The implementation of HLF on distributed large-scale IoT systems is proposed. The performance of the HLF is evaluated in terms of throughput, latency, network sizes, scalability, and the number of peers serviceable by the platform. The experimental results demonstrate that the proposed framework can provide a detailed and real-time performance evaluation of blockchain systems for large-scale IoT applications. The diversity and the sheer increase in the number of connected IoT devices have brought significant concerns about storing and protecting the large IoT data volume. Dependencies of the centralized server solution impose significant trust issues and make it vulnerable to security risks. A layer-based distributed data storage design and implementation of a blockchain-enabled large-scale IoT system is proposed to mitigate these challenges by using the HLF platform for distributed ledger solutions. The need for a centralized server and third-party auditor is eliminated by leveraging HLF peers who perform transaction verification and records audits in a big data system with the help of blockchain technology. The HLF blockchain facilitates storing the lightweight verification tags on the blockchain ledger. In contrast, the actual metadata is stored in the off-chain big data system to reduce the communication overheads and enhance data integrity. Finally, experiments are conducted to evaluate the performance of the proposed scheme in terms of throughput, latency, communication, and computation costs. The results indicate the feasibility of the proposed solution to retrieve and store the provenance of large-scale IoT data within the big data ecosystem using the HLF blockchain

Massey Research Online

Secure entity authentication

Author: Dou Zuochao
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2018
Field of study

According to Wikipedia, authentication is the act of confirming the truth of an attribute of a single piece of a datum claimed true by an entity. Specifically, entity authentication is the process by which an agent in a distributed system gains confidence in the identity of a communicating partner (Bellare et al.). Legacy password authentication is still the most popular one, however, it suffers from many limitations, such as hacking through social engineering techniques, dictionary attack or database leak. To address the security concerns in legacy password-based authentication, many new authentication factors are introduced, such as PINs (Personal Identification Numbers) delivered through out-of-band channels, human biometrics and hardware tokens. However, each of these authentication factors has its own inherent weaknesses and security limitations. For example, phishing is still effective even when using out-of-band-channels to deliver PINs (Personal Identification Numbers). In this dissertation, three types of secure entity authentication schemes are developed to alleviate the weaknesses and limitations of existing authentication mechanisms: (1) End user authentication scheme based on Network Round-Trip Time (NRTT) to complement location based authentication mechanisms; (2) Apache Hadoop authentication mechanism based on Trusted Platform Module (TPM) technology; and (3) Web server authentication mechanism for phishing detection with a new detection factor NRTT. In the first work, a new authentication factor based on NRTT is presented. Two research challenges (i.e., the secure measurement of NRTT and the network instabilities) are addressed to show that NRTT can be used to uniquely and securely identify login locations and hence can support location-based web authentication mechanisms. The experiments and analysis show that NRTT has superior usability, deploy-ability, security, and performance properties compared to the state-of-the-art web authentication factors. In the second work, departing from the Kerb eros-centric approach, an authentication framework for Hadoop that utilizes Trusted Platform Module (TPM) technology is proposed. It is proven that pushing the security down to the hardware level in conjunction with software techniques provides better protection over software only solutions. The proposed approach provides significant security guarantees against insider threats, which manipulate the execution environment without the consent of legitimate clients. Extensive experiments are conducted to validate the performance and the security properties of the proposed approach. Moreover, the correctness and the security guarantees are formally proved via Burrows-Abadi-Needham (BAN) logic. In the third work, together with a phishing victim identification algorithm, NRTT is used as a new phishing detection feature to improve the detection accuracy of existing phishing detection approaches. The state-of-art phishing detection methods fall into two categories: heuristics and blacklist. The experiments show that the combination of NRTT with existing heuristics can improve the overall detection accuracy while maintaining a low false positive rate. In the future, to develop a more robust and efficient phishing detection scheme, it is paramount for phishing detection approaches to carefully select the features that strike the right balance between detection accuracy and robustness in the face of potential manipulations. In addition, leveraging Deep Learning (DL) algorithms to improve the performance of phishing detection schemes could be a viable alternative to traditional machine learning algorithms (e.g., SVM, LR), especially when handling complex and large scale datasets

Digital Commons @ New Jersey Institute of Technology (NJIT)

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Big Data Security (Volume 3)

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 10/02/2021
Field of study

After a short description of the key concepts of big data the book explores on the secrecy and security threats posed especially by cloud based data storage. It delivers conceptual frameworks and models along with case studies of recent technology

Directory of Open Access Books (DOAB)

System for monitoring and supporting the treatment of sleep apnea using IoT and big data

Author: Bsoul
Carlos E. Palau
David Sarabia-Jácome
Diana Yacchirema
Gami
Glasser
Gubbi
Hodges
IP
Islam
Kapusuz Gencer
Kokkonis
Manoel
Manuel Esteve
Marrone
Memos
Memos
Nakano
Nam
Sannino
Sapountzi
Stergiou
Teran Santos
Valham
Whitmore
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

[EN] Sleep apnea has become in the sleep disorder that causes greater concern in recent years due to its morbidity and mortality, higher medical care costs and poor people quality of life. Some proposals have addressed sleep apnea disease in elderly people, but they have still some technical limitations. For these reasons, this paper presents an innovative system based on fog and cloud computing technologies which in combination with IoT and big data platforms offers new opportunities to build novel and innovative services for supporting the sleep apnea and to overcome the current limitations. Particularly, the system is built on several low-power wireless networks with heterogeneous smart devices (i.e, sensors and actuators). In the fog, an edge node (Smart IoT Gateway) provides IoT connection and interoperability and pre-processing IoT data to detect events in real-time that might endanger the elderly's health and to act accordingly. In the cloud, a Generic Enabler Context Broker manages, stores and injects data into the big data analyzer for further processing and analyzing. The system's performance and subjective applicability are evaluated using over 30 GB size datasets and a questionnaire fulfilled by medicals specialist, respectively. Results show that the system data analytics improve the health professionals' decision making to monitor and guide sleep apnea treatment, as well as improving elderly people's quality of life. (C) 2018 Elsevier B.V. All rights reserved.This research was supported by the Ecuadorian Government through the Secretary of Higher Education, Science, Technology, and Innovation (SENESCYT) and has received funding from the European Union's "Horizon 2020'' research and innovation program as part of the ACTIVAGE project under Grant 732679 and the Interoperability of Heterogeneous IoT Platforms project (INTER-IoT) under Grant 687283.Yacchirema-Vargas, DC.; Sarabia-Jácome, DF.; Palau Salvador, CE.; Esteve Domingo, M. (2018). System for monitoring and supporting the treatment of sleep apnea using IoT and big data. Pervasive and Mobile Computing. 50:25-40. https://doi.org/10.1016/j.pmcj.2018.07.007S25405

Crossref

RiuNet

Preserving the Quality of Architectural Tactics in Source Code

Author: Mirakhorli Mehdi
Publication venue: DePaul University
Publication date: 13/06/2014
Field of study

In any complex software system, strong interdependencies exist between requirements and software architecture. Requirements drive architectural choices while also being constrained by the existing architecture and by what is economically feasible. This makes it advisable to concurrently specify the requirements, to devise and compare alternative architectural design solutions, and ultimately to make a series of design decisions in order to satisfy each of the quality concerns. Unfortunately, anecdotal evidence has shown that architectural knowledge tends to be tacit in nature, stored in the heads of people, and lost over time. Therefore, developers often lack comprehensive knowledge of underlying architectural design decisions and inadvertently degrade the quality of the architecture while performing maintenance activities. In practice, this problem can be addressed through preserving the relationships between the requirements, architectural design decisions and their implementations in the source code, and then using this information to keep developers aware of critical architectural aspects of the code. This dissertation presents a novel approach that utilizes machine learning techniques to recover and preserve the relationships between architecturally significant requirements, architectural decisions and their realizations in the implemented code. Our approach for recovering architectural decisions includes the two primary stages of training and classification. In the first stage, the classifier is trained using code snippets of different architectural decisions collected from various software systems. During this phase, the classifier learns the terms that developers typically use to implement each architectural decision. These ``indicator terms\u27\u27 represent method names, variable names, comments, or the development APIs that developers inevitably use to implement various architectural decisions. A probabilistic weight is then computed for each potential indicator term with respect to each type of architectural decision. The weight estimates how strongly an indicator term represents a specific architectural tactics/decisions. For example, a term such as \emph{pulse} is highly representative of the heartbeat tactic but occurs infrequently in the authentication. After learning the indicator terms, the classifier can compute the likelihood that any given source file implements a specific architectural decision. The classifier was evaluated through several different experiments including classical cross-validation over code snippets of 50 open source projects and on the entire source code of a large scale software system. Results showed that classifier can reliably recognize a wide range of architectural decisions. The technique introduced in this dissertation is used to develop the Archie tool suite. Archie is a plug-in for Eclipse and is designed to detect wide range of architectural design decisions in the code and to protect them from potential degradation during maintenance activities. It has several features for performing change impact analysis of architectural concerns at both the code and design level and proactively keep developers informed of underlying architectural decisions during maintenance activities. Archie is at the stage of technology transfer at the US Department of Homeland Security where it is purely used to detect and monitor security choices. Furthermore, this outcome is integrated into the Department of Homeland Security\u27s Software Assurance Market Place (SWAMP) to advance research and development of secure software systems

Via Sapientiae: The Institutional Repository at DePaul University

Big Data Knowledge System in Healthcare

Author: Abbas Kaja M
Lopez Daphne
Manogaran Gunasekaran
Sundarsekar Revathi
Thota Chandu
Vijayakumar V
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The health care systems are rapidly adopting large amounts of data, driven by record keeping, compliance and regulatory requirements, and patient care. The advances in healthcare system will rapidly enlarge the size of the health records that are accessible electronically. Concurrently, fast progress has been made in clinical analytics. For example, new techniques for analyzing large size of data and gleaning new business insights from that analysis is part of what is known as big data. Big data also hold the promise of supporting a wide range of medical and healthcare functions, including among others disease surveillance, clinical decision support and population health management. Hence, effective big data based knowledge management system is needed for monitoring of patients and identify the clinical decisions to the doctor. The chapter proposes a big data based knowledge management system to develop the clinical decisions. The proposed knowledge system is developed based on variety of databases such as Electronic Health Record (EHR), Medical Imaging Data, Unstructured Clinical Notes and Genetic Data. The proposed methodology asynchronously communicates with different data sources and produces many alternative decisions to the doctor

LSHTM Research Online