36 research outputs found
FfDL : A Flexible Multi-tenant Deep Learning Platform
Deep learning (DL) is becoming increasingly popular in several application
domains and has made several new application features involving computer
vision, speech recognition and synthesis, self-driving automobiles, drug
design, etc. feasible and accurate. As a result, large scale on-premise and
cloud-hosted deep learning platforms have become essential infrastructure in
many organizations. These systems accept, schedule, manage and execute DL
training jobs at scale.
This paper describes the design, implementation and our experiences with
FfDL, a DL platform used at IBM. We describe how our design balances
dependability with scalability, elasticity, flexibility and efficiency. We
examine FfDL qualitatively through a retrospective look at the lessons learned
from building, operating, and supporting FfDL; and quantitatively through a
detailed empirical evaluation of FfDL, including the overheads introduced by
the platform for various deep learning models, the load and performance
observed in a real case study using FfDL within our organization, the frequency
of various faults observed including unanticipated faults, and experiments
demonstrating the benefits of various scheduling policies. FfDL has been
open-sourced.Comment: MIDDLEWARE 201
Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures
One of the significant shifts of the next-generation computing technologies will certainly be in
the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD
landmark, evolved as a widely deployed BD operating system. Its new features include
federation structure and many associated frameworks, which provide Hadoop 3.x with the
maturity to serve different markets. This dissertation addresses two leading issues involved in
exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely,
(i)Scalability that directly affects the system performance and overall throughput using
portable Docker containers. (ii) Security that spread the adoption of data protection practices
among practitioners using access controls. An Enhanced Mapreduce Environment (EME),
OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker
(BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for
data streaming to the cloud computing are the main contribution of this thesis study
Computational Methods for Medical and Cyber Security
Over the past decade, computational methods, including machine learning (ML) and deep learning (DL), have been exponentially growing in their development of solutions in various domains, especially medicine, cybersecurity, finance, and education. While these applications of machine learning algorithms have been proven beneficial in various fields, many shortcomings have also been highlighted, such as the lack of benchmark datasets, the inability to learn from small datasets, the cost of architecture, adversarial attacks, and imbalanced datasets. On the other hand, new and emerging algorithms, such as deep learning, one-shot learning, continuous learning, and generative adversarial networks, have successfully solved various tasks in these fields. Therefore, applying these new methods to life-critical missions is crucial, as is measuring these less-traditional algorithms' success when used in these fields
Navigating the IoT landscape: Unraveling forensics, security issues, applications, research challenges, and future
Given the exponential expansion of the internet, the possibilities of
security attacks and cybercrimes have increased accordingly. However, poorly
implemented security mechanisms in the Internet of Things (IoT) devices make
them susceptible to cyberattacks, which can directly affect users. IoT
forensics is thus needed for investigating and mitigating such attacks. While
many works have examined IoT applications and challenges, only a few have
focused on both the forensic and security issues in IoT. Therefore, this paper
reviews forensic and security issues associated with IoT in different fields.
Future prospects and challenges in IoT research and development are also
highlighted. As demonstrated in the literature, most IoT devices are vulnerable
to attacks due to a lack of standardized security measures. Unauthorized users
could get access, compromise data, and even benefit from control of critical
infrastructure. To fulfil the security-conscious needs of consumers, IoT can be
used to develop a smart home system by designing a FLIP-based system that is
highly scalable and adaptable. Utilizing a blockchain-based authentication
mechanism with a multi-chain structure can provide additional security
protection between different trust domains. Deep learning can be utilized to
develop a network forensics framework with a high-performing system for
detecting and tracking cyberattack incidents. Moreover, researchers should
consider limiting the amount of data created and delivered when using big data
to develop IoT-based smart systems. The findings of this review will stimulate
academics to seek potential solutions for the identified issues, thereby
advancing the IoT field.Comment: 77 pages, 5 figures, 5 table
Recommended from our members
Operating system support for warehouse-scale computing
Modern applications are increasingly backed by large-scale data centres. Systems software in these data centre environments, however, faces substantial challenges: the lack of uniform resource abstractions makes sharing and resource management inefficient, infrastructure software lacks end-to-end access control mechanisms, and work placement ignores the effects of hardware heterogeneity and workload interference.
In this dissertation, I argue that uniform, clean-slate operating system (OS) abstractions designed to support distributed systems can make data centres more efficient and secure. I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources.
First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes. Translucent abstractions free users from having to understand implementation details, but enable introspection for performance optimisation. Fine-grained access control is supported by combining
storable, communicable identifier capabilities, and context-dependent, ephemeral handle capabilities. Finally, multi-phase I/O requests implement optimistically concurrent access to objects
while supporting diverse application-level consistency policies.
Second, I present the DIOS operating system, an implementation of my model as an extension to Linux. The DIOS system call API is centred around distributed objects, globally resolvable names, and translucent references that carry context-sensitive object meta-data. I illustrate how these concepts support distributed applications, and evaluate the performance of DIOS in microbenchmarks and a data-intensive MapReduce application. I find that it offers improved, finegrained isolation of resources, while permitting flexible sharing.
Third, I present the Firmament cluster scheduler, which generalises prior work on scheduling via minimum-cost flow optimisation. Firmament can flexibly express many scheduling policies using pluggable cost models; it makes high-quality placement decisions based on fine-grained information about tasks and resources; and it scales the flow-based scheduling approach to very large clusters. In two case studies, I show that Firmament supports policies that reduce colocation interference between tasks and that it successfully exploits flexibility in the workload to improve the energy efficiency of a heterogeneous cluster. Moreover, my evaluation shows that Firmament scales the minimum-cost flow optimisation to clusters of tens of thousands of machines while still making sub-second placement decisions.St John's College Supplementary Emolument Fund
DARP
Big Data Analytics in Static and Streaming Provenance
Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate
sensors and model simulations to understand spatial, temporal, social, and ecological
relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However,
provenance can be overwhelming in both volume and complexity; the now forecasting
potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making
Internet of Things From Hype to Reality
The Internet of Things (IoT) has gained significant mindshare, let alone attention, in academia and the industry especially over the past few years. The reasons behind this interest are the potential capabilities that IoT promises to offer. On the personal level, it paints a picture of a future world where all the things in our ambient environment are connected to the Internet and seamlessly communicate with each other to operate intelligently. The ultimate goal is to enable objects around us to efficiently sense our surroundings, inexpensively communicate, and ultimately create a better environment for us: one where everyday objects act based on what we need and like without explicit instructions
Data-Driven Methods for Data Center Operations Support
During the last decade, cloud technologies have been evolving at
an impressive pace, such that we are now living in a cloud-native
era where developers can leverage on an unprecedented landscape
of (possibly managed) services for orchestration, compute, storage,
load-balancing, monitoring, etc. The possibility to have on-demand
access to a diverse set of configurable virtualized resources allows
for building more elastic, flexible and highly-resilient distributed
applications. Behind the scenes, cloud providers sustain the heavy
burden of maintaining the underlying infrastructures, consisting in
large-scale distributed systems, partitioned and replicated among
many geographically dislocated data centers to guarantee scalability,
robustness to failures, high availability and low latency. The larger the
scale, the more cloud providers have to deal with complex interactions
among the various components, such that monitoring, diagnosing and
troubleshooting issues become incredibly daunting tasks.
To keep up with these challenges, development and operations
practices have undergone significant transformations, especially in
terms of improving the automations that make releasing new software,
and responding to unforeseen issues, faster and sustainable at scale.
The resulting paradigm is nowadays referred to as DevOps. However,
while such automations can be very sophisticated, traditional DevOps
practices fundamentally rely on reactive mechanisms, that typically
require careful manual tuning and supervision from human experts.
To minimize the risk of outages—and the related costs—it is crucial to
provide DevOps teams with suitable tools that can enable a proactive
approach to data center operations.
This work presents a comprehensive data-driven framework to address
the most relevant problems that can be experienced in large-scale
distributed cloud infrastructures. These environments are indeed characterized
by a very large availability of diverse data, collected at each
level of the stack, such as: time-series (e.g., physical host measurements,
virtual machine or container metrics, networking components
logs, application KPIs); graphs (e.g., network topologies, fault graphs
reporting dependencies among hardware and software components,
performance issues propagation networks); and text (e.g., source code,
system logs, version control system history, code review feedbacks).
Such data are also typically updated with relatively high frequency,
and subject to distribution drifts caused by continuous configuration
changes to the underlying infrastructure. In such a highly dynamic scenario,
traditional model-driven approaches alone may be inadequate
at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust
data-driven methods to support their decisions based on historical
information. For instance, effective anomaly detection capabilities may
also help in conducting more precise and efficient root-cause analysis.
Also, leveraging on accurate forecasting and intelligent control
strategies would improve resource management.
Given their ability to deal with high-dimensional, complex data,
Deep Learning-based methods are the most straightforward option for
the realization of the aforementioned support tools. On the other hand,
because of their complexity, this kind of models often requires huge
processing power, and suitable hardware, to be operated effectively
at scale. These aspects must be carefully addressed when applying
such methods in the context of data center operations. Automated
operations approaches must be dependable and cost-efficient, not to
degrade the services they are built to improve.
i