5,127 research outputs found
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Virtualizing the Stampede2 Supercomputer with Applications to HPC in the Cloud
Methods developed at the Texas Advanced Computing Center (TACC) are described
and demonstrated for automating the construction of an elastic, virtual cluster
emulating the Stampede2 high performance computing (HPC) system. The cluster
can be built and/or scaled in a matter of minutes on the Jetstream self-service
cloud system and shares many properties of the original Stampede2, including:
i) common identity management, ii) access to the same file systems, iii)
equivalent software application stack and module system, iv) similar job
scheduling interface via Slurm.
We measure time-to-solution for a number of common scientific applications on
our virtual cluster against equivalent runs on Stampede2 and develop an
application profile where performance is similar or otherwise acceptable. For
such applications, the virtual cluster provides an effective form of "cloud
bursting" with the potential to significantly improve overall turnaround time,
particularly when Stampede2 is experiencing long queue wait times. In addition,
the virtual cluster can be used for test and debug without directly impacting
Stampede2. We conclude with a discussion of how science gateways can leverage
the TACC Jobs API web service to incorporate this cloud bursting technique
transparently to the end user.Comment: 6 pages, 0 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
Evaluating and Enabling Scalable High Performance Computing Workloads on Commercial Clouds
Performance, usability, and accessibility are critical components of high performance computing (HPC). Usability and performance are especially important to academic researchers as they generally have little time to learn a new technology and demand a certain type of performance in order to ensure the quality and quantity of their research results. We have observed that while not all workloads run well in the cloud, some workloads perform well. We have also observed that although commercial cloud adoption by industry has been growing at a rapid pace, its use by academic researchers has not grown as quickly. We aim to help close this gap and enable researchers to utilize the commercial cloud more efficiently and effectively.
We present our results on architecting and benchmarking an HPC environment on Amazon Web Services (AWS) where we observe that there are particular types of applications that are and are not suited for the commercial cloud. Then, we present our results on architecting and building a provisioning and workflow management tool (PAW), where we developed an application that enables a user to launch an HPC environment in the cloud, execute a customizable workflow, and after the workflow has completed delete the HPC environment automatically. We then present our results on the scalability of PAW and the commercial cloud for compute intensive workloads by deploying a 1.1 million vCPU cluster. We then discuss our research into the feasibility of utilizing commercial cloud infrastructure to help tackle the large spikes and data-intensive characteristics of Transportation Cyberphysical Systems (TCPS) workloads. Then, we present our research in utilizing the commercial cloud for urgent HPC applications by deploying a 1.5 million vCPU cluster to process 211TB of traffic video data to be utilized by first responders during an evacuation situation. Lastly, we present the contributions and conclusions drawn from this work
Design Space Exploration for Distributed Cyber-Physical Systems: State-of-the-art, Challenges, and Directions
Computer Systems, Imagery and Medi
Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies
Grid is an infrastructure that involves the integrated and collaborative use
of computers, networks, databases and scientific instruments owned and managed
by multiple organizations. Grid applications often involve large amounts of
data and/or computing resources that require secure resource sharing across
organizational boundaries. This makes Grid application management and
deployment a complex undertaking. Grid middlewares provide users with seamless
computing ability and uniform access to resources in the heterogeneous Grid
environment. Several software toolkits and systems have been developed, most of
which are results of academic research projects, all over the world. This
chapter will focus on four of these middlewares--UNICORE, Globus, Legion and
Gridbus. It also presents our implementation of a resource broker for UNICORE
as this functionality was not supported in it. A comparison of these systems on
the basis of the architecture, implementation model and several other features
is included.Comment: 19 pages, 10 figure
Calm before the storm: the challenges of cloud computing in digital forensics
Cloud computing is a rapidly evolving information technology (IT) phenomenon. Rather than procure, deploy and manage a physical IT infrastructure to host their software applications, organizations are increasingly deploying their infrastructure into remote, virtualized environments, often hosted and managed by third parties. This development has significant implications for digital forensic investigators, equipment vendors, law enforcement, as well as corporate compliance and audit departments (among others). Much of digital forensic practice assumes careful control and management of IT assets (particularly data storage) during the conduct of an investigation. This paper summarises the key aspects of cloud computing and analyses how established digital forensic procedures will be invalidated in this new environment. Several new research challenges addressing this changing context are also identified and discussed
- …