384,993 research outputs found
Data Management Challenges in Cloud Environments
Recently the cloud computing paradigm has been receiving special excitement and attention in the new researches. Cloud computing has the potential to change a large part of the IT activity, making software even more interesting as a service and shaping the way IT hardware is proposed and purchased. Developers with novel ideas for new Internet services no longer require the large capital outlays in hardware to present their service or the human expense to do it. These cloud applications apply large data centers and powerful servers that host Web applications and Web services. This report presents an overview of what cloud computing means, its history along with the advantages and disadvantages. In this paper we describe the problems and opportunities of deploying data management issues on these emerging cloud computing platforms. We study that large scale data analysis jobs, decision support systems, and application specific data marts are more likely to take benefit of cloud computing platforms than operational, transactional database systems.
 
Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data
With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access
software applications from web browsers while relieving them from the installation of any software applications in
their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based
systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing.
In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development
Monitoring Large-Scale Cloud Systems with Layered Gossip Protocols
Monitoring is an essential aspect of maintaining and developing computer
systems that increases in difficulty proportional to the size of the system.
The need for robust monitoring tools has become more evident with the advent of
cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to
deploy vast numbers of virtual machines as part of dynamic and transient
architectures. Current monitoring solutions, including many of those in the
open-source domain rely on outdated concepts including manual deployment and
configuration, centralised data collection and adapt poorly to membership
churn.
In this paper we propose the development of a cloud monitoring suite to
provide scalable and robust lookup, data collection and analysis services for
large-scale cloud systems. In lieu of centrally managed monitoring we propose a
multi-tier architecture using a layered gossip protocol to aggregate monitoring
information and facilitate lookup, information collection and the
identification of redundant capacity. This allows for a resource aware data
collection and storage architecture that operates over the system being
monitored. This in turn enables monitoring to be done in-situ without the need
for significant additional infrastructure to facilitate monitoring services. We
evaluate this approach against alternative monitoring paradigms and demonstrate
how our solution is well adapted to usage in a cloud-computing context.Comment: Extended Abstract for the ACM International Symposium on
High-Performance Parallel and Distributed Computing (HPDC 2013) Poster Trac
On the Easy Use of Scientific Computing Services for Large Scale Linear Algebra and Parallel Decision Making with the P-Grade Portal
International audienceScientific research is becoming increasingly dependent on the large-scale analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.). Scientific computing (Petitet et al. 1999) aims at constructing mathematical models and numerical solution techniques for solving problems arising in science and engineering. In this paper, we describe the services of an integrated portal based on the P-Grade (Parallel Grid Run-time and Application Development Environment) portal (http://www.p-grade.hu) that enables the solution of large-scale linear systems of equations using direct solvers, makes easier the use of parallel block iterative algorithm and provides an interface for parallel decision making algorithms. The ultimate goal is to develop a single sign on integrated multi-service environment providing an easy access to different kind of mathematical calculations and algorithms to be performed on hybrid distributed computing infrastructures combining the benefits of large clusters, Grid or cloud, when needed
- …