Search CORE

217,429 research outputs found

A Hadoop use case for engineering data

Author: Lange Benoit
Nguyen Toan
Publication venue: HAL CCSD
Publication date: 20/06/2015
Field of study

This paper presents the VELaSSCo project (Visualization for Extremely LArge-Scale Scientific Computing). It aims to develop a platform to manipulate scientific data used by FEM (Finite Element Method) and DEM (Discrete Element Method) simulations. The project focuses on the development of a distributed, heterogeneous and high-performance platform, enabling the scientific communities to store, process and visualize huge amounts of data. The platform is compatible with current hardware capabilities, as well as future hardware

Hal - Université Grenoble Alpes

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Author: Camille Coti
Camille Coti
Camille Coti
Emmanuel Agullo
Emmanuel Agullo
Emmanuel Agullo
Jack Dongarra
Jack Dongarra
Jack Dongarra
Julien Langou
Julien Langou
Qr Fac
Thomas Herault
Thomas Herault
Thomas Herault
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2009
Field of study

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams

Author: Lughofer Edwin
Pardede Eric
Pratama Mahardhika
Rahayu Dwi A. P.
Za'in Choiru
Publication venue
Publication date: 01/01/2021
Field of study

The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (

DA^3

) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only

25\%

label proportions. It shows highly competitive performance even if compared with fully supervised learners with

100\%

label proportions.Comment: This paper has been accepted for publication in Information Science

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Parallel and Context Based Search in Cloud using Multi Agent System

Author: Mer Hiren V
Patel Ronak C
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 27/02/2015
Field of study

Cloud Computing is one of the fast growing Technology. Cloud computing support large scale infrastructure used to increase high performance of computing. This technology support agents and with the help of integration of the agents that is Multi Agent System (MAS) which is capable of intelligent behavior. They run in an environment where they communicate with each other using message passing technique. Each agent has its own set of behavior and they run independent of each other. When a message arrives each agent shows their own behavior and hence an agent shows their coordination. The use of MAS in cloud computing help us for searching context with better performance. The JADE is a platform which supports agent. This paper discusses about Cloud computing models and architectures, information retrieving technique and the use of MAS that improve the performance of big data search from Distributed File System (DFS) which is difficult to achieve using single agent or thread. Keywords: Cloud Computing, Distributed File System, JADE, MA

International Institute for Science, Technology and Education (IISTE): E-Journals

HPC-GAP: engineering a 21st-century high-performance computer algebra system

Author: Behrends Reimer
Hammond Kevin
Janjic Vladimir
Konovalov Alexander
Linton Steve
Loidl Hans-Wolfgang
Maier Patrick
Trinder Phil
Publication venue: 'Wiley'
Publication date: 15/01/2015
Field of study

Symbolic computation has underpinned a number of key advances in Mathematics and Computer Science. Applications are typically large and potentially highly parallel, making them good candidates for parallel execution at a variety of scales from multi-core to high-performance computing systems. However, much existing work on parallel computing is based around numeric rather than symbolic computations. In particular, symbolic computing presents particular problems in terms of varying granularity and irregular task sizes thatdo not match conventional approaches to parallelisation. It also presents problems in terms of the structure of the algorithms and data. This paper describes a new implementation of the free open-source GAP computational algebra system that places parallelism at the heart of the design, dealing with the key scalability and cross-platform portability problems. We provide three system layers that deal with the three most important classes of hardware: individual shared memory multi-core nodes, mid-scale distributed clusters of (multi-core) nodes, and full-blown HPC systems, comprising large-scale tightly-connected networks of multi-core nodes. This requires us to develop new cross-layer programming abstractions in the form of new domain-specific skeletons that allow us to seamlessly target different hardware levels. Our results show that, using our approach, we can achieve good scalability and speedups for two realistic exemplars, on high-performance systems comprising up to 32,000 cores, as well as on ubiquitous multi-core systems and distributed clusters. The work reported here paves the way towards full scale exploitation of symbolic computation by high-performance computing systems, and we demonstrate the potential with two major case studies

Enlighten: Research Data (University of Glasgow)

Stirling Online Research Repository (RIOXX)

Sheffield Hallam University Research Archive

Enlighten

Stirling Online Research Repository

FigShare

Crossref

Heriot Watt Pure

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

University of Dundee Online Publications

University of St. Andrews - Pure

St Andrews Research Repository

A Hadoop use case for engineering data

Author: Lange Benoit
Nguyen Toan
Publication venue: HAL CCSD
Publication date: 20/06/2015
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

High Throughput Protein Similarity Searches in the LIBI Grid Problem Solving Environment

Author: ALOISIO Giovanni
Casadio R
EPICOCO Italo
Fariselli P
Fiore S
Mirto M
Rossi I
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Bioinformatics applications are naturally distributed, due to distribution of involved data sets, experimental data and biological databases. They require high computing power, owing to the large size of data sets and the complexity of basic computations, may access heterogeneous data, where heterogeneity is in data format, access policy, distribution, etc., and require a secure infrastructure, because they could access private data owned by different organizations. The Problem Solving Environment (PSE) is an approach and a technology that can fulfil such bioinformatics requirements. The PSE can be used for the definition and composition of complex applications, hiding programming and configuration details to the user that can concentrate only on the specific problem. Moreover, Grids can be used for building geographically distributed collaborative problem solving environments and Grid aware PSEs can search and use dispersed high performance computing, networking, and data resources. In this work, the PSE solution has been chosen as the integration platform of bioinformatics tools and data sources. In particular an experiment of multiple sequence alignment on large scale, supported by the LIBIPSE, is presented

Archivio Istituzionale della Ricerca- Università del Salento

Cloud Storage Performance and Security Analysis with Hadoop and GridFTP

Author: Liu Wei-Li
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2013
Field of study

Even though cloud server has been around for a few years, most of the web hosts today have not converted to cloud yet. If the purpose of the cloud server is distributing and storing files on the internet, FTP servers were much earlier than the cloud. FTP server is sufficient to distribute content on the internet. Therefore, is it worth to shift from FTP server to cloud server? The cloud storage provider declares high durability and availability for their users, and the ability to scale up for more storage space easily could save users tons of money. However, does it provide higher performance and better security features? Hadoop is a very popular platform for cloud computing. It is free software under Apache License. It is written in Java and supports large data processing in a distributed environment. Characteristics of Hadoop include partitioning of data, computing across thousands of hosts, and executing application computations in parallel. Hadoop Distributed File System allows rapid data transfer up to thousands of terabytes, and is capable of operating even in the case of node failure. GridFTP supports high-speed data transfer for wide-area networks. It is based on the FTP and features multiple data channels for parallel transfers. This report describes the technology behind HDFS and enhancement to the Hadoop security features with Kerberos. Based on data transfer performance and security features of HDFS and GridFTP server, we can decide if we should replace GridFTP server with HDFS. According to our experiment result, we conclude that GridFTP server provides better throughput than HDFS, and Kerberos has minimal impact to HDFS performance. We proposed a solution which users authenticate with HDFS first, and get the file from HDFS server to the client using GridFTP

SJSU ScholarWorks