Search CORE

44,309 research outputs found

Two ways to Grid: the contribution of Open Grid Services Architecture (OGSA) mechanisms to service-centric and resource-centric lifecycles

Author: Brebner P.
Emmerich W.
Publication venue
Publication date: 31/01/2006
Field of study

Service Oriented Architectures (SOAs) support service lifecycle tasks, including Development, Deployment, Discovery and Use. We observe that there are two disparate ways to use Grid SOAs such as the Open Grid Services Architecture (OGSA) as exemplified in the Globus Toolkit (GT3/4). One is a traditional enterprise SOA use where end-user services are developed, deployed and resourced behind firewalls, for use by external consumers: a service-centric (or ‘first-order’) approach. The other supports end-user development, deployment, and resourcing of applications across organizations via the use of execution and resource management services: A Resource-centric (or ‘second-order’) approach. We analyze and compare the two approaches using a combination of empirical experiments and an architectural evaluation methodology (scenario, mechanism, and quality attributes) to reveal common and distinct strengths and weaknesses. The impact of potential improvements (which are likely to be manifested by GT4) is estimated, and opportunities for alternative architectures and technologies explored. We conclude by investigating if the two approaches can be converged or combined, and if they are compatible on shared resources

UCL Discovery

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

Author: Bertram Ludäscher
Brian
Junfei Qiu
Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/10/2018
Field of study

Data collection for scientific applications is increasing exponentially and is forecasted to soon reach peta- and exabyte scales. Applications which process and analyze scientific data must be scalable and focus on execution performance to keep pace. In the field of radio astronomy, in addition to increasingly large datasets, tasks such as the identification of transient radio signals from extrasolar sources are computationally expensive. We present a scalable approach to radio pulsar detection written in Scala that parallelizes candidate identification to take advantage of in-memory task processing using Apache Spark on a YARN distributed system. Furthermore, we introduce a novel automated multiclass supervised machine learning technique that we combine with feature selection to reduce the time required for candidate classification. Experimental testing on a Beowulf cluster with 15 data nodes shows that the parallel implementation of the identification algorithm offers a speedup of up to 5X that of a similar multithreaded implementation. Further, we show that the combination of automated multiclass classification and feature selection speeds up the execution performance of the RandomForest machine learning algorithm by an average of 54% with less than a 2% average reduction in the algorithm's ability to correctly classify pulsars. The generalizability of these results is demonstrated by using two real-world radio astronomy data sets.Comment: In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). ACM, New York, NY, USA, Article 11, 11 page

arXiv.org e-Print Archive

Crossref

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Author: Canon Shane
Chhugani Jatin
Demmel James
Devarakonda Aditya
Gerhardt Lisa
Gittens Alex
Harrell Jim
Kottalam Jey
Krishnamurthy Venkat
Liu Jialin
Mahoney Michael W.
Maschhoff Kristyn
Prabhat
Racah Evan
Ringenburg Michael
Sharma Pramod
Yang Jiyan
Publication venue
Publication date: 12/05/2016
Field of study

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

arXiv.org e-Print Archive

eScholarship - University of California