411 research outputs found
A Cyberinfrastructure for BigData Transportation Engineering
Big Data-driven transportation engineering has the potential to improve
utilization of road infrastructure, decrease traffic fatalities, improve fuel
consumption, decrease construction worker injuries, among others. Despite these
benefits, research on Big Data-driven transportation engineering is difficult
today due to the computational expertise required to get started. This work
proposes BoaT, a transportation-specific programming language, and it's Big
Data infrastructure that is aimed at decreasing this barrier to entry. Our
evaluation that uses over two dozen research questions from six categories show
that research is easier to realize as a BoaT computer program, an order of
magnitude faster when this program is run, and exhibits 12-14x decrease in
storage requirements
MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME
Computational experiments using spatial stochastic simulations have led to
important new biological insights, but they require specialized tools, a
complex software stack, as well as large and scalable compute and data analysis
resources due to the large computational cost associated with Monte Carlo
computational workflows. The complexity of setting up and managing a
large-scale distributed computation environment to support productive and
reproducible modeling can be prohibitive for practitioners in systems biology.
This results in a barrier to the adoption of spatial stochastic simulation
tools, effectively limiting the type of biological questions addressed by
quantitative modeling. In this paper, we present PyURDME, a new, user-friendly
spatial modeling and simulation package, and MOLNs, a cloud computing appliance
for distributed simulation of stochastic reaction-diffusion models. MOLNs is
based on IPython and provides an interactive programming platform for
development of sharable and reproducible distributed parallel computational
experiments
Overview of Caching Mechanisms to Improve Hadoop Performance
Nowadays distributed computing environments, large amounts of data are
generated from different resources with a high velocity, rendering the data
difficult to capture, manage, and process within existing relational databases.
Hadoop is a tool to store and process large datasets in a parallel manner
across a cluster of machines in a distributed environment. Hadoop brings many
benefits like flexibility, scalability, and high fault tolerance; however, it
faces some challenges in terms of data access time, I/O operation, and
duplicate computations resulting in extra overhead, resource wastage, and poor
performance. Many researchers have utilized caching mechanisms to tackle these
challenges. For example, they have presented approaches to improve data access
time, enhance data locality rate, remove repetitive calculations, reduce the
number of I/O operations, decrease the job execution time, and increase
resource efficiency. In the current study, we provide a comprehensive overview
of caching strategies to improve Hadoop performance. Additionally, a novel
classification is introduced based on cache utilization. Using this
classification, we analyze the impact on Hadoop performance and discuss the
advantages and disadvantages of each group. Finally, a novel hybrid approach
called Hybrid Intelligent Cache (HIC) that combines the benefits of two methods
from different groups, H-SVM-LRU and CLQLMRS, is presented. Experimental
results show that our hybrid method achieves an average improvement of 31.2% in
job execution time
An Approach to Ad hoc Cloud Computing
We consider how underused computing resources within an enterprise may be
harnessed to improve utilization and create an elastic computing
infrastructure. Most current cloud provision involves a data center model, in
which clusters of machines are dedicated to running cloud infrastructure
software. We propose an additional model, the ad hoc cloud, in which
infrastructure software is distributed over resources harvested from machines
already in existence within an enterprise. In contrast to the data center cloud
model, resource levels are not established a priori, nor are resources
dedicated exclusively to the cloud while in use. A participating machine is not
dedicated to the cloud, but has some other primary purpose such as running
interactive processes for a particular user. We outline the major
implementation challenges and one approach to tackling them
Tem_357 Harnessing the Power of Digital Transformation, Artificial Intelligence and Big Data Analytics with Parallel Computing
Traditionally, 2D and especially 3D forward modeling and inversion of large geophysical datasets are performed on supercomputing clusters. This was due to the fact computing time taken by using PC was too time
consuming. With the introduction of parallel computing, attempts have been made to perform computationally intensive tasks on PC or clusters
of personal computers where the computing power was based on Central Processing Unit (CPU). It is further enhanced with Graphical Processing Unit (GPU) as the GPU has become affordable with the launch of GPU based computing devices. Therefore this paper presents a didactic concept in learning and applying parallel computing with the use of General Purpose Graphical Processing Unit (GPGPU) was carried out
and perform preliminary testing in migrating existing sequential codes for solving initially 2D forward modeling of geophysical dataset. There are
many challenges in performing these tasks mainly due to lack of some necessary development software tools, but the preliminary findings are promising.
Traditionally, 2D and especially 3D forward modeling and inversion of large geophysical datasets are performed on supercomputing clusters. This was due to the fact computing time taken by using PC was too time
consuming. With the introduction of parallel computing, attempts have been made to perform computationally intensive tasks on PC or clusters
of personal computers where the computing power was based on Central Processing Unit (CPU). It is further enhanced with Graphical Processing Unit (GPU) as the GPU has become affordable with the launch of GPU based computing devices. Therefore this paper presents a didactic concept in learning and applying parallel computing with the use of General Purpose Graphical Processing Unit (GPGPU) was carried out and perform preliminary testing in migrating existing sequential codes for solving initially 2D forward modeling of geophysical dataset. There are
many challenges in performing these tasks mainly due to lack of some necessary development software tools, but the preliminary findings are
promising.Traditionally, 2D and especially 3D forward modeling and inversion of large geophysical datasets are performed on supercomputing clusters.
This was due to the fact computing time taken by using PC was too time consuming. With the introduction of parallel computing, attempts have been made to perform computationally intensive tasks on PC or clusters of personal computers where the computing power was based on Central Processing Unit (CPU). It is further enhanced with Graphical Processing Unit (GPU) as the GPU has become affordable with the launch of GPU
based computing devices. Therefore this paper presents a didactic concept in learning and applying parallel computing with the use of General Purpose Graphical Processing Unit (GPGPU) was carried out and perform preliminary testing in migrating existing sequential codes for solving initially 2D forward modeling of geophysical dataset. There are
many challenges in performing these tasks mainly due to lack of some necessary development software tools, but the preliminary findings are promising
Big Data Management Challenges, Approaches, Tools and their limitations
International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data
Hadoop Performance Analysis on Raspberry Pi for DNA Sequence Alignment
The rapid development of electronic data has brought two major challenges, namely, how to store big data and how to process it. Two main problems in processing big data are the high cost and the computational power. Hadoop, one of the open source frameworks for processing big data, uses distributed computational model designed to be able to run on commodity hardware. The aim of this research is to analyze Hadoop cluster on Raspberry Pi as a commodity hardware for DNA sequence alignment. Six B Model Raspberry Pi and a Biodoop library were used in this research for DNA sequence alignment. The length of the DNA used in this research is between 5,639 bp and 13,271 bp. The results showed that the Hadoop cluster was running on the Raspberry Pi with average usage of processor 73.08%, 334.69 MB of memory and 19.89 minutes of job time completion. The distribution of Hadoop data file blocks was found to reduce processor usage as much as 24.14% and memory usage as much as 8.49%. However this increased job processing time as much as 31.53%
- …