Search CORE

103,195 research outputs found

Recommended from our members

An effective data placement strategy for XML documents

Author: Lü K
Zhu Y
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2001
Field of study

As XML is increasingly being used in Web applications, new technologies need to be investigated for processing XML documents with high performance. Parallelism is a promising solution for structured document processing and data placement is a major factor for system performance improvement in parallel processing. This paper describes an effective XML document data placement strategy. The new strategy is based on a multilevel graph partitioning algorithm with the consideration of the unique features of XML documents and query distributions. A new algorithm, which is based on XML query schemas to derive the weighted graph from the labelled directed graph presentation of XML documents, is also proposed. Performance analysis on the algorithm presented in the paper shows that the new data placement strategy exhibits low workload skew and a high degree of parallelism

Brunel University Research Archive

A Logical Model and Data Placement Strategies for MEMS Storage Devices

Author: Kim Min-Soo
Kim Yi-Reun
Song Il-Yeol
Whang Kyu-Young
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 29/07/2008
Field of study

MEMS storage devices are new non-volatile secondary storages that have outstanding advantages over magnetic disks. MEMS storage devices, however, are much different from magnetic disks in the structure and access characteristics. They have thousands of heads called probe tips and provide the following two major access facilities: (1) flexibility: freely selecting a set of probe tips for accessing data, (2) parallelism: simultaneously reading and writing data with the set of probe tips selected. Due to these characteristics, it is nontrivial to find data placements that fully utilize the capability of MEMS storage devices. In this paper, we propose a simple logical model called the Region-Sector (RS) model that abstracts major characteristics affecting data retrieval performance, such as flexibility and parallelism, from the physical MEMS storage model. We also suggest heuristic data placement strategies based on the RS model and derive new data placements for relational data and two-dimensional spatial data by using those strategies. Experimental results show that the proposed data placements improve the data retrieval performance by up to 4.0 times for relational data and by up to 4.8 times for two-dimensional spatial data of approximately 320 Mbytes compared with those of existing data placements. Further, these improvements are expected to be more marked as the database size grows.Comment: 37 page

arXiv.org e-Print Archive

Crossref

Cloud computing resource scheduling and a survey of its evolutionary approaches

Author: Chung Henry Shu-Hung
Gong Yue-Jiao
Li Yun
Liu Xiao-Fang
Zhan Zhi-Hui
Zhang Jun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2015
Field of study

A disruptive technology fundamentally transforming the way that computing services are delivered, cloud computing offers information and communication technology users a new dimension of convenience of resources, as services via the Internet. Because cloud provides a finite pool of virtualized on-demand resources, optimally scheduling them has become an essential and rewarding topic, where a trend of using Evolutionary Computation (EC) algorithms is emerging rapidly. Through analyzing the cloud computing architecture, this survey first presents taxonomy at two levels of scheduling cloud resources. It then paints a landscape of the scheduling problem and solutions. According to the taxonomy, a comprehensive survey of state-of-the-art approaches is presented systematically. Looking forward, challenges and potential future research directions are investigated and invited, including real-time scheduling, adaptive dynamic scheduling, large-scale scheduling, multiobjective scheduling, and distributed and parallel scheduling. At the dawn of Industry 4.0, cloud computing scheduling for cyber-physical integration with the presence of big data is also discussed. Research in this area is only in its infancy, but with the rapid fusion of information and data technology, more exciting and agenda-setting topics are likely to emerge on the horizon

Crossref

Enlighten

Scalable Persistent Storage for Erlang

Author: Chechina Natalia
Ghaffari Amir
Meredith Jon
Trinder Phil
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular

Crossref

Enlighten

Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Author: Ailamaki A.
Boncz P. A.
Boncz P. A.
Chen M.-S.
Chen M.-S.
DeWitt D. J.
Lang H.
Li Y.
Liu B.
Lohman G. M.
Lu H.
Manegold S.
Ono K.
Schneider D. A.
Shatdal A.
Stillger M.
Zhang N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/07/2015
Field of study

Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

arXiv.org e-Print Archive

Crossref

Optimal fault-tolerant placement of relay nodes in a mission critical wireless network

Author: Di Pompeo Silvia
Finzi Alberto
Guarneri Riccardo
Lanciotti Filiberto
Mancini Toni
Scialanca Agostino
Tronci Enrico
Publication venue
Publication date: 01/01/2018
Field of study

The operations of many critical infrastructures (e.g., airports) heavily depend on proper functioning of the radio communication network supporting operations. As a result, such a communication network is indeed a mission-critical communication network that needs adequate protection from external electromagnetic interferences. This is usually done through radiogoniometers. Basically, by using at least three suitably deployed radiogoniometers and a gateway gathering information from them, sources of electromagnetic emissions that are not supposed to be present in the monitored area can be localised. Typically, relay nodes are used to connect radiogoniometers to the gateway. As a result, some degree of fault-tolerance for the network of relay nodes is essential in order to offer a reliable monitoring. On the other hand, deployment of relay nodes is typically quite expensive. As a result, we have two conflicting requirements: minimise costs while guaranteeing a given fault-tolerance. In this paper address the problem of computing a deployment for relay nodes that minimises the relay node network cost while at the same time guaranteeing proper working of the network even when some of the relay nodes (up to a given maximum number) become faulty (fault-tolerance). We show that the above problem can be formulated as a Mixed Integer Linear Programming (MILP) as well as a Pseudo-Boolean Satisfiability (PB-SAT) optimisation problem and present experimental results com- paring the two approaches on realistic scenarios

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio della ricerca- Università di Roma La Sapienza